Multi-paradigm

Nippon Ichi And the Love of Grind

2023-03-06T10:12:00.001-08:00

The RPG-ification of AAA titles has been going on for so long that it’s almost strange to see the rare game where you aren’t collecting some resource to get stronger and improve one attribute or another. The motivation is obvious: It gives a sense of progression, expression, replayability, and allows players to trade skill for time. It seems hard for AAA studios to release games that are difficult because challenge could dissuade players looking for a cheap power fantasy, but don’t have the time to invest in mastering the games they play. Publishers want the widest possible audiences so they want to know that players can continually make progress and enjoy the game, not get hung up on sudden difficulty spikes. Can’t get past a boss or this mission is too hard? A short little grind should suffice.

Time, being a player’s most valuable resource, should not be taken for granted, which I think most developers are aware of. This is why enemy difficulty often scales with player level rather than becoming DPS checkpoints, and why games like The Division try to fill the map full with various EXP-awarding activities and new guns, and games like God of War carefully control the player’s power level at every step of the journey. “Grinding is just bad game design,” you might read on forums and social media. “Good” games use their RPG mechanics to give you as much power as you need, when you need it.

This does not appear to be how Nippon Ichi sees the world, but I have to take a slight detour for those unfamiliar with their works. Readers familiar with them may skip the following section.

Incredible Power Levels

Disgaea 6 had this trailer that introduced a new level cap. RPG’s rarely go above 100, but this cap was 99,999,999. If you look at the experience calculations for a Disgaea game, you’ll find that the first 100 levels use a standard exponentially-increasing EXP requirement for each level, but then they switch to a different expression and then another later on. Most RPG’s have to cap EXP so low because it would become exuberantly expensive to continue leveling up. Even if there is no cap that players might realistically reach, the content must end before this happens.

But Disgaea doesn’t stop there, it has reincarnation. Much of the end-game of a Disgaea entry is spent reaching max level. Then you reincarnate. This reduces your unit’s level back to one, but retains some of the stats from its previous life. You max its level out again and then reincarnate again.

This is all, by the way, required for post-game content. The final, final boss is always Baal, an incredibly difficult encounter where only the strongest teams can even make it past the first turn of combat. And it takes, as you’ve probably guessed, a lot of grinding. Grinding to get to max level, reincarnating, and grinding some more.

Reincarnation and grinding are essential features in most of the Nippon Ichi games I’ve played, including: Disgaea, Makai Kingdom, Labyrinth of Refrain and its sequel, etc.. It’s just a company that loves the grind.

Grind Is Good

On its surface, Nippon Ichi’s philosophy seems a step in the wrong direction, but the popularity of their franchises speak differently. Players of most RPGs will talk about the story, characters, and world, perhaps viewing combat as a way to space out the more interesting elements of the game, but players of Nippon Ichi games will talk about the gameplay and achievable power levels, saying “the game doesn’t really start until the post-game.” They’ll talk about ways of gaining thousands of levels worth of experience very quickly after a reincarnation, how to create OP builds, how to use the game’s built-in cheat functions to gain money and experience quickly, etc..

Grind in a Nippon Ichi game isn’t the activity done between story beats, it is the game. Their stories aren’t generally that bad and are full of humor, but they’re not the main attraction.

They accomplish this feat in two ways:

Unparalleled depth.
Seemingly endless content.

Depth: It’s Prinnies All The Way Down.

One of the first concepts players get introduced to in Disgaea is throwing your units around. A unit can only move a number of tiles according to their movement stat, but a unit can pick another up, throw it, and then the thrown unit can move much further. But prinnies are special. Each unit has an “evility” that might trigger at specific moments or add stats, and the prinny evility is to explode when thrown, dealing damage proportionate to health.

Nippon Ichi has a knack for introducing concepts and immediately turning them on their heads. In Labrynth of Refrain, you learn that to unlock doors, you need keys. Simple enough. Also you can break many, often most, of the walls in the dungeon, subverting the key/door dynamic. In Phantom Brave, you can pick rocks, bushes, and all sorts of stuff off the ground and use them to defeat enemies. Also you can carry them back to base and materialize new units from them. In Disgaea, you’ll want to constantly be equipping better gear, but also each item has a whole “item world” to explore and the item’s stats improve the deeper you go.

This depth allows Nippon Ichi games to sometimes be puzzles where you have to find out of the box solutions to strategy problems and sometimes be joyous grindfests where you figure out the most efficient way to increase in power to get beyond a challenge. Most of the time, the player gets to decide which way they want to handle the game. When I first played Labyrinth of Refrain, I had to think very carefully about my team compositions, but in playing its sequel I better understood how to use the stockpile system to keep overleveled throughout much of my run.

A particular moment in Disgaea 5 I keep coming back to in my mind: I was on this map with a bridge over a poisoned water river. Each turn in the river, units will take damage so obviously the bridge is the way to go, but the enemies on the bridge were a bit too powerful for my team. Then it occurred to me I could pick them up and throw them off the bridge to have them take poison damage while my team waited out the battle to clean up the remainders. It’s the kind of thing I could see players going onto forums and saying “cheese discovered in level, trivializes challenge, please fix,” but at the same time the feeling of discovery in that moment was strong enough that I remember it so many years later and the game wouldn’t feel so special to me were it not for the cheese.

Conclusions

Nippon Ichi’s philosophy has been a personal inspiration and I adore their games, I love the grind. Players are often right to criticize games for being overly-grind focused when they trap players into repetitive and unfulfilling loops, becoming more work-like than play-like. Well, the criticism probably still stands against Nippon Ichi’s titles, but the difference is in how Ichi’s titles are built to continue to reward players for the act of playing itself. When the core thrust of player motivation is on a steady IV drip of story content then how unfulfilling the core gameplay is only becomes more apparent when the story slows down. But when the gameplay itself is what one draws pleasure from, none of that matters.

Embrace the grind. Love the grind.

How Vampire Survivors Made Me Rethink The Concept of the "Core Gameplay Loop"

2022-07-27T22:00:00.001-07:00

I first saw Vampire Survivor as demoed on Northernlion’s youtube channel and it struck me as “just another game”. The gameplay looked simplistic, the art looked rough, and the concept didn’t seem like a very big innovation. I didn’t really look at it with any seriousness until I saw “20 Minutes Till Dawn” and “Spirit Hunters: Infinite Horde”. That’s when it occurred to me: this is kinda a big deal. Vampire Survivors only hit the internet in late 2021 and already it has many clones, often by relatively new studios, sometimes even veteran studios. These studios either decided that this should be a trend and would be so big that it was worth the months of development to try and ship a competitor, or were themselves so infatuated with Survivors’ concept that they needed to try their own ideas.

Point is, this game is a pretty big deal and I don’t think I’m alone in having initially thought this game didn’t look like much. Its core gameplay loop seems hilariously simplistic: kill enemies to get gems to gain experience, level up, and kill more enemies. One could probably describe very well-respected roguelites like Slay the Spire and Rogue Legacy the same way, but I think many would also point out the boss battles, the lore, and how skill-based those games feel. I want to put a pin in the skill part for now and focus on bosses and lore, which are not necessarily a part of the core loop, but I think are key to understanding Vampire Survivors.

It took me quite a while to understand Vampire Survivors on a level deeper than “some people find ‘bullet heaven’ fun” and that led me to question the utility of thinking of core gameplay loops as simple loops.

I usually hear this discussed in terms of micro and macro; micro being what you’re doing now and macro being what will happen soon. For Vampire Survivors, the micro would be the loop I described above, but the macro is a bit more interesting. When you go into a stage, you might have a goal such as “explore a specific part of the map,” or “collect gold,” the meta-progression resource. You might be completing a challenge which will unlock some new content upon completion, some of which are to “evolve” a weapon (upgrade a weapon to the max level while having a specific item in your inventory). To accomplish this goal, you might have a specific build in mind. But just having goals or higher level objectives doesn’t seem like enough for a game to be special. What’s the secret? It takes a moment to explain…

At the start of the game, you have just one weapon and every time you level up, you might gain an additional weapon which changes your build, get an item which enhances the build, or upgrade a weapon which scales your damage. As time goes on, waves get stronger and you need these upgrades in order to match your damage output to keep up with the enemy spawn rate which is also increasing over time, or at least be able to outrun them. On a good run, at some point, you will have outpaced the spawn rate and just mow enemies down–what is often referred to as “bullet heaven”, and from then on the player is unlikely to lose until the timer expires and the reaper comes out. Even from a very simple mechanical base, Vampire Survivors is able to evoke a number of different emotions in the player at different stages. It creates different feelings across each stage which makes them feel more compelling, less monotonous.

This is what I really missed when I looked at Vampire Survivors before, and what shows me the flaw of viewing a core gameplay loop as just a loop. You could look at it as a spiral, too. As the game continues, the loop of kill -> collect -> level -> kill changes and recontextualizes over time. At first, you are weak, but enemies are infrequent, collecting EXP is slow. As you gain another weapon, your ability to chase down enemies increases and you can gain EXP a little faster. Eventually the enemies start to come in faster than the player can kill them and there’s this push and pull as you need to run away to not get killed, but you also need to circle around to collect the EXP they drop. After a few level ups, you get a weapon evolved and become overpowered and nothing can touch you. Leveling up goes from just trying to gather a few extra weapons so you don’t get outpaced to trying to optimize the damage output as quickly as possible and trying to set up evolutions, to maximizing your weapons, to finalizing the build, and finally just cleaning up whatever upgrades are left to collect. Every part of the loop changes as you progress through the game and so after going a few cycles through it, you find yourself not actually in the same place as where you started.

I’ll refer to this as the “core gameplay spiral”; the way that iterations around a core gameplay loop alters the dynamics as the game changes over time or just in general progresses. By breaking the ideas of the core loop, progression systems, and time apart, it becomes harder to understand how they relate to each other and especially treating the core gameplay loop as static discourages understanding how it evolves over time. The core actions you’re doing never change, but how they feel does change, as do the dynamics they create. I further posit that games will probably, generally, be more interesting when their core loop is not static, and this extends to meta-progression, too.

The one backtrack I have to make is that the “core gameplay loop” is typically only a mechanics-deep analysis that does not consider extraneous elements like dynamics or progression. So maybe saying “core spiral” isn’t a great name because it suggests that the spiral is just the core loop unraveled. It is not, they are separate concepts for different layers of abstraction.

Earlier, I did briefly mention that I think one aspect of Vampire Survivors that might lead some to disregard it is that it appears to have a low skill floor; that is the minimum amount of skill required to be proficient at a game. Even the skill ceiling doesn't seem particularly high. But I hope it’s clear that this doesn’t actually matter very much since the progression of the core gameplay loop more than makes up for this. Both “Spirit Hunters: Infinite Horde” and “20 Minutes Till Dawn” add much more skill-based elements with their faster gameplay and “20 Minutes” even requiring that you manually fire your gun, yet I feel they just miss the point. Their core loops are static in comparison to Vampire Survivors. In “Spirit Hunters”, the stats of weapons can change, but the weapons themselves change very little, they don’t evolve, and only occasional pets will give global bufs to your hero. Every round has largely the same goal: collect gems (the meta-progression currency) to unlock more on the web (a very large skill tree, essentially). “20 Minutes” has more characters and weapons to unlock and you can freely mix and match them, but there is less to unlock, it all unlocks fairly quickly; and since you manually control the weapon and it always fires bullets from the same point, there are just less dramatic changes to the core loop throughout each run. They are both good games and some may prefer one or both over Vampire Survivors, but in my opinion they aren’t quite at the same level.

The insight I have gained from this analysis is that when I’m thinking about my own designs, I want to think primarily about this core gameplay spiral. One with a simple loop at its base that is expanded as the player spends time in the game and creates different dynamics at different points along the spiral. And what I want to avoid is thinking of the gameplay loop, micros and macros, and progression systems as parallel entities. As a practical example, while working on a deckbuilder roguelite, I asked myself “why does picking the second card feel different from picking the first?” and after doing so, I’d realized it did not. I had the progression systems, the loops, and all the features I thought made a good game, but I didn’t have decisions properly compounding on each other and thinking about how the loop should change over time pushed me to make some significant changes to my mechanics, hopefully for the better. And so I hope that this analysis might similarly help others see their own projects in a different way, ask the right questions, and find better solutions.

Game Design as a Dialogue

2022-05-02T21:17:00.002-07:00

Media criticism is one of my favorite uses of Youtube and somehow, the platform unearthed a really great set of critics to choose from, for movies. There are Lindsay ellis , Folding Ideas, Filmento, CJ the X, Filmento, etc., but in games we have far fewer. GMTK is the current gold standard, then there are Adam Millard, Design Doc, Dunkey, Noah Cadwell, Psych of Play, and a few others strewn here and there. Furthermore, comparing channels focussing on game design or analysis vs movies and books, I always felt something was missing.

For a case study of good media criticism, I want to look at Lindsay Ellis’s video essay, “That Time Disney Remade Beauty and the Beast”. The essay discusses the history of Beauty and the Beast, differences in adaptations, social context, political context, how it fits within the Disney brand and how this impacted the motivations of the film, thus creative choices in its development. Then, with all this context, when we arrive at what is the core issues that Ellis takes with the plot, we not only gain a greater understanding of how the creative decisions affected the piece as a whole. but how those decisions form part of a dialogue that extends beyond the movie itself, This grants us new insights on the film and the culture that produced it.

It’s this dialogue that I feel is most absent from discussion of video game criticism and analysis. Many channels discussing design of a specific topic in games, like jumping physics, will happily discuss all the different ways jump physics may be implemented, and how that affects the way in which the different games play, much more rare is a discussion of the dialogue between different designers and between game makers and players of different games and how and why they chose different jump physics. The kind of analysis seen most often serves as more of a collection of useful ideas and it’s left up to the watcher to pick among the options they like. While this kind of content is unambiguously good and we could use more of it, we also deserve deeper analysis that engages with the larger dialogue, not just describes its existence. The problem is that without engaging in this dialogue, the viewer does not gain insights they could not have otherwise gained by just being aware of the games being discussed.

For an example, I’d like to use Design Doc’s “How Do You Improve Turn Based Combat?”. This video serves as a good 101 course in understanding what kinds of turn based systems have been experimented with and various ideas in it. DD does a good job of talking about how the systems work and how they affect the player, and what components go into turn based combat design. Again, this is unambiguously good content, but it doesn’t contain entirely novel ideas that add to the discourse or do much to explain why one game might have chosen one system over another.

For contrast, Hbomberguy’s (Harris Michael Brewis) more focussed video, “Bloodborne Is Genius, And Here's Why” is one if the most influential discussions of game design, for me, personally. The central bit that always stuck out was his story of a friend of a friend who didn’t so much get the Dark Souls games, but perhaps learned to play them differently by Bloodborne conditioning them to approach them in a new way. This led to them having more fun. Through this story, we gain some insight that the way we play can affect our enjoyment of a game, and the how a game is structured can impact how we play. To get to this point, Brewis discusses the history of Dark Souls, how it relates to other games, personal motivations and ideas about games, references other works and discussions of game design. Importantly, one point I really don’t want to overlook is how Brewis includes the player in the dialogue by referencing how other players respond to Bloodborne and relate to it rather than speaking entirely from their singular perspective.

To add my own critique to the pool, I want to talk a bit about dodge/parry canceling and why it’s a very important, but frequently omitted feature of action games and Dark Souls (DS) in particular. DS is so synonymous with “difficult” that it is often considered the defining feature and almost any difficult game is inevitably compared to it. Hidetaka Miyazaki allegedly didn’t set out to make a hard game, telling metro.co.uk in an interview, however

“I personally want my games to be described as satisfying rather than difficult. As a matter of fact, I am aiming at giving players sense of accomplishment in the use of difficulty.”

As Noah Cadwell points out in their video, “I Beat the Dark Souls Trilogy and All I Made Was This Lousy Video Essay”, the series is more inviting to players of all skill ranges than the “git gud” crowd seems to think.

As a game synonymous with “difficult”, difficult games seem to have taken some of its design ideas from DS. The one I find most puzzling is the omission of dodge and parry canceling or even the omission of parrying altogether, not because DS doesn’t have parrying, but because it’s underutilized.

In DS, attacking involves patience, planning, spacing, and strategy because starting an attack locks you into its animation which can only be canceled after its active frames have completed. This gives DS a slower, more deliberate pace in contrast to the twitch reflex feel of more action-oriented games.

For twitch-based action games, dodge canceling encourages more aggressive behavior from the player since they can always hit the “get out of danger” button at any time. To incentivise more skilled, less risk-averse play, games like Beyonetta reward the player specifically for dodging at just the right time with some slow-mo where the player can deal significant damage before the enemy can respond. YS VIII gives slow-mo for perfect dodges and temporary invincibility for perfect parries and allows players to potentially have both buffs at once. A well-timed dodge feels good any day of the week, but many designers seem to have wanted to embellish this feeling with extra mechanical satisfaction.

When FromSoftware made Sekiro after DS3, dodge and parry canceling were indeed features, but the game had a different aesthetic. DS featured a dark, sinister, and cruel world where punishing the player for impatient and poorly-timed attacks added to this dark aesthetic. You are a nobody; a simple, clumsy soldier facing foes far above your weight class. While Sekiro features a similarly dark world, you play as a named protagonist with a refined fighting style, use stealth when it suits you, picking when and how to attack. DS wants you to feel powerless, Sekiro wants you to feel powerful and so it gives you a dodge cancel, even giving a high-damage, cinematic counter attack for a perfect parry.

Yet, many contemporary action games such as Death’s Door and Sifu are missing this. Why? Death’s Door in particular has combo strings where every attack in the string uses the same animation and the string is of an arbitrary length determined by the weapon’s upgrade level and cannot be canceled out of. The most I ever heard about why the combat system was designed the way was just a snippet from a noclip documentary where one of the team said for this kind of game, you want faster, instant attacks compared to some other games. Was dodge cancellation discussed and omitted for specific reasons? Would the game be better or worse with it? Is this a holdover from Dark Souls’ legacy as part of some anti-dodge canceling movement in game design or an isolated decision?

Now, I hope it doesn’t sound like I’m negging on Death’s Door’s omission of dodge canceling for no reason, but I do think it’s an interesting example of a lacking in the discourse surrounding game design. When I hear people discuss the game, people will often reference the story, funny writing, art, music, creative bosses, and presentation while spending little time discussing the combat despite players spending a large portion of the game immersed in battles. And that’s kind of interesting in its own right. Perhaps the combat is as good as it needs to be, but not really a central focus of the player’s enjoyment, or perhaps the other elements of the game are just that good. That in itself gives us some ideas about what might be most important to players and how they view games. But designers spend a lot of time deliberating over small decisions like what features to include, what are must-haves and what might be stretch goals, and trying to figure out if they matter–if so, why?

Personally, I think the absence of dodge rolling in Death’s Door is interesting because it follows a trend in contemporary action games, but also in that it might suggest its roots in 2D Zelda games. Sifu references older beat ‘em up while feeling very modern (despite also feeling like PS2’s God Hand). Both games fit into modern gaming as having retro inspiration with modern twists, asking us to look back at gaming’s past to see if there’s anything we’ve missed before we continue forward.

This brings me to why I decided to write this piece. A dialogue exists between designers and between designers and players, but it is all too often not spoken. Someone makes a game, another developer sees this and is inspired to make their own, borrowing or stealing ideas, adding new ones, and over time the craft matures. Players develop an aesthetic sense of pleasurable and unpleasurable experiences, they discuss them on social media, they buy games or they don’t. This dialogue permeates all of gaming culture and the games industry, influencing what game makers make and what gamers play. Yet, when we discuss individual game design decisions, we often discuss them in a vacuum rather than as a part of a complex web of desires, motivations, aspirations, and ideas that play off each other. Every decision a games studio makes, from the color of a health bar to the nuances of the mechanics, are a part of a story that extends back to pong and further, and will continue being written as long as people make games.

So, when looking at a game and appreciating that it is fun, or has generally high production values, or just is interesting in some way, I would encourage the reader to think about this game as part of this greater dialogue. What does it say about how games should work? What does it add to the discussion? What may be left out? What were the intentions of the designer and what kind of experience did they want players to have?

OpenGL tutorials can do better than raw pointer math.

2020-08-07T13:32:00.001-07:00

If you've ever followed an OpenGL tutorial, you probably saw code like this:

  float verts[] = {
    // Position   TexCoords
    -X, -Y, Z,    0.0f, 1.0f,
     X, -Y, Z,    1.0f, 1.0f,
     X,  Y, Z,    1.0f, 0.0f,
    -X,  Y, Z,    0.0f, 0.0f
  };

  GLuint vbo = glGenBuffer();
  glBindBuffer(GL_ARRAY_BUFFER, vbo);
  glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);

So far not... terrible. The verts array may contain additional color information or random other things, but it'll basically look like this. Later, we have to tell the graphics card to actually draw this data, informing our GLSL shaders how to read it:

  GLint vertex_pos_attrib = glGetUniformLocation("vertex_pos");
  glEnableVertexAttribArray(vertex_pos_attrib);

  glVertexAttribPointer(vertex_pos_attrib, 3, GL_FLOAT,
                        GL_FALSE, 5 * sizeof(float), (void*)0); // A

  GLint tex_coord_attrib = glGetUniformLocation("tex_coord");
  glEnableVertexAttribArray(tex_coord_attrib);
  glVertexAttribPointer(tex_coord_attrib, 2, GL_FLOAT, 
                        GL_FALSE, 5 * sizeof(float), (void*)(3 * sizeof(float)));

and this is where these tutorials seem to be dropping the ball. The call I marked with A is sending information to the graphics card that the GLSL variable, vertex_pos, should be filled with two floats, or 3 elements of data, a stride of 5 * sizeof(float) bytes between vertices, and 0 bytes offset from the beginning of the vertices array buffer. The next call passes in nearly identical information, but 2 elements and 3 * sizeof(float) bytes from the beginning. The extra (void*) cast is just converting the argument to the expected type.

This code is brittle for just too many reasons.

If vertex information is added or removed, all the pointer offset math has to change.
The same goes for if the datatype changes, like from an int32 to int64.
If one forgets the * sizeof(float) part, there could be trouble when moving to other data types as one might guess the size in bytes wrong.
The math gets more complicated if types of varying sizes are used.

Just use a struct

  struct Vertex {
    GLfloat pos[3];
    GLfloat tex_coords[2];
  };

  float verts[] = {
    // Position     TexCoords
    {{-X, -Y, Z},  {0.0f, 1.0f}},
    {{ X, -Y, Z},  {1.0f, 1.0f}},
    {{ X,  Y, Z},  {1.0f, 0.0f}},
    {{-X,  Y, Z},  {0.0f, 0.0f}}
  };

Now our data is more organized. We can share information about the binary layout of our code with other parts of our program, write functions for generating points, constructors, and all sorts of good stuff. We can even use other, less basic data types, in theory. My code uses glm::vec3s here, for example.

We need the corresponding glVertexAttribPointer calls to change a bit, too. I've seen a number of attempts at this, sometimes with the idea that one can take a member pointer and deference it at null in order to get the offset, then convert it back into a pointer. Or one could just use offsetof(). Finally, we can achieve something like this:

glVertexAttribPointer(tex_coord_attrib,

                      sizeof(Vertex::tex_coords) / sizeof(GLfloat), GL_FLOAT,

                      GL_FALSE,

                      sizeof(Vertex),

                      (void*)offsetof(Vertex, tex_coords));

and we're pretty close to something approaching ideal. It's unfortunate that we have to know that Vertex::tex_coords is made up of GLfloats, though. It could be advisable to define a global, constexpr unsigned int D = 3, which is used both instead of hardcoding in the Vertex definition and here instead of sizeof(), but the gains are marginal since we still have to pass in GL_FLOAT as the next parameter

Still, to improve further...

Generalize

This is the actual code I use:

template<>
struct GlTraits<GLfloat> {
  static constexpr auto glType = GL_FLOAT;
};

template<typename RealType, typename T, typename Mem>
inline void vertexAttribPointer(GLuint index, GLboolean normalized,
                                const Mem T::*mem) {
  // This is basically C's offsetof() macro generalized to member pointers.
  RealType* pointer = (RealType*)&(((T*)nullptr)->*mem);
  glVertexAttribPointer(index, sizeof(Mem) / sizeof(RealType),
                        GlTraits<RealType>::glType, normalized, sizeof(T),
                        pointer);
}

Mem here is often going to be a glm::vec3 or something similar, T being my vertex class. Unfortunately, RealType needs to be passed in as an explicit type since neither Mem nor T suggest what the OpenGL type actually is. Theoretically, if this library knew what a glm::vec3 was, it could also deduce the real type from the parameter, or one could specialize GlTraits for it and include a typedef or using statement that gave the "real" type.

I use the function like this:

  vertexAttribPointer<float>(tex_coord_attrib, GL_FALSE, &Vertex::tex_coord);
  vertexAttribPointer<float>(vertex_pos_attrib, GL_FALSE, &Vertex::pos);

Simple!

Conclusion

Many OpenGL tutorials are suited for younger, less experienced programmers and so--giving their authors some credit--writers may not want to confuse readers with language features their readers hadn't seen before, like offsetof(), but I think this does a disservice to beginners as these tutorials are also authoritative on how good, well-written OpenGL code looks. They may not want to go into the template madness that I enjoy, but even tutorials targeting C or C++ can be significantly improved just by using a data structure for vertex information. Beginners should be taught to write resilient programs that resists the urge to hardcode sizes, types, and offsets based on their specifications at the time of writing and instead rely on the compiler more.

Brittle code can punish beginners when they try to expand their knowledge as the slightest changes can cause their screen to stop rendering, or render odd artifacts and this can be difficult to debug. At worst, it will cause segfaults.

The Entity Component System

2020-07-29T11:46:00.003-07:00

Note: code highlighting thanks to http://hilite.me/

Note2: This blog is primarily on template meta-programming--when the opportunity presents--using it to solve practical problems. This post assumes knowledge of C++17 fold expressions. If the reader is unfamiliar, this may not be the best learning material, but if the reader gets the idea and wants a practical example, please read on.

Update July 30 2020: I realized a bit late a bug where non-const values weren't always respect in the final solution. I haven't tracked it down yet so just be warned.

Update July31: Fixed a bug that caused an infinite loop.

When I started programming, I made a simple game called Orbital Chaos and the code was organized very simply.

class Actor {
  // ...
};

class Player : public Actor {
  // ...
};

class Enemy : public Actor {
  // ...
};

So each thing that an actor could do was represented in the base class and the derived classes overrode that behavior in classic OOP style. Those were simpler days.

I decided to rewrite my code to what I now understand is a fairly industry-standard idea: The entity component system or ECS for short. From what I gather, an ECS is a simple system where entities are merely ID's with a set of components which define all relevant information important to that entity. For example, its <X,Y> coordinates in the 2D space of a 2d game, the type of an entity ({TREE, ITEM, PLAYER, etc...}), perhaps information on how to draw it... basically anything that would have gone in Actor or Player.

Proponents of ECS systems will argue for being a better alternative for OOP, not requiring the base class to consider every possible action or attribute in the game and cache efficiency--storing like information close together so that iterations over all entities hopefully involve less cache misses. Sometimes it's also argued that using ID's to relate objects to each other is much safer than direct pointers. The alternative is often std::shared_ptr with locking. Lastly, entities are easier to extend, mix, match, and define than objects, not having to worry about hierarchies, mix-ins, and multiple inheritance to get the desired behavior.

Personally, I think writing a base class that considers every possible action is not good OOP, but doing it the "right" way is probably less appealing than the data-oriented ECS style. OOP works great for abstract objects with variations on behavior, but game entities tend to vary more on configuration. I think this simplicity sometimes comes at the cost of having to reinvent mix-ins and hierarchies, but one can still put pointers and virtual in an ECS by storing pointers.

Information on ECS's on the internet varies in quality as do their implementations. I won't try to argue that I will present an objectively better implementation than any others, these are just my ideas and the story of my process to write one.

Ideas I didn't like

class Entity {
  GraphicsComponent gfx_component;
  PhysicsComponent phys_component;
  EtcComponent etc;
  // ...
};

I think this is closer to the "right" OOP solution as it encapsulates the complexity into smaller pieces which can then be as simple or intricate as necessary. In practice, the segmentation of code logic and how systems relate to each other may become unclear. In a grid-based game like mine, entities will have an integer <X,Y> position on the grid which maps to a very different position when rendering to the screen which may even by offset slightly by animations. How do components refer to each other? It encourages using virtual functions to customize behavior, which is very tidy, but also involve a lot of boilerplate code and isn't incredibly efficient. Lastly, it doesn't store like data together.

There's also the idea of having the Entity store a vector of component objects and it's one of the more common implementations I see, but this has the additional issue that one class needs to consider all the complexity of the game. For example, if each component object has a init(), update(), etc() functions, they have to understand how to update themselves and interact with the world, but the world needs to understand how to contain them leading to two mutually referential system. That can be fine, but in my previous attempt, I found this really complicated the definition of more interesting components.

using ComponentId = unsigned int;

template<typename T>
struct ComponentData {
    EntityId id;
    T data;

    ComponentData(EntityId id, T data)
        : id(id), data(std::move(data)) { }
};
class Buffer {
  char* data = nullptr;
  // ...
};

class EntityComponentSystem {
  std::unordered_map<ComponentId, Buffer> components;
};

I think this idea gets closer. Each component type has a unique ComponentId which is determined by the type system through template magic and each Buffer holds an array of these components that is manually managed by itself since std:: classes like vector need to know what type it contains and how large it is at compile time. This std::unordered_map could be replaced by an std::array if the number of different component types are known in advance.

Actually, if the component types themselves are known and I happen to love tuples, we could just do this:

template<typename T>
struct ComponentData {
    EntityId id;
    T data;

    ComponentData(EntityId id, T data)
        : id(id), data(std::move(data)) { }
};

using EntityId = unsigned int;

template<typename...Components>
class EntityComponentSystem {
  std::tuple<std::vector<ComponentData<Components>>...> components;
};

and that's what I ended up with in my first iteration (link). One big and notable downside is that now, if I have a grid position, for which I use my custom Vec<int> class, and a graphical position, which uses a Vec<int> as well, they'll be stored as the same component. It's tedious, but I implemented GraphicalPosition and GridPosition classes to differentiate.

It also means that the declaration of the ECS type needs to contain every component used in the game, but I don't find that too problematic.

From the above idea, we could construct a simple interface like this:

  EntityComponentSystem<Pos, TextureRef> ecs;
  auto e1 = iu.write_new_entity(Pos(1, 2), TextureRef(tex1));
  auto e2 = iu.write_new_entity(Pos(3, 4), TextureRef(tex2));
  for (auto id : ecs.ids()) {
    Pos p;
    TextureRef tref;
    if (ecs.read(id, p, tref))
      draw(p, tref);
  }

In the end, I use a much more efficient interface, but this gives an idea about how it might be used. e1 and e2 are entity ID's that have a Pos and TextureRef component which can be read back out in order to draw them onto the screen if the entity has both components.

One note, before we move on: Why an std::vector and not a hash table keyed by the entity ID? Many entity component systems do it this way and it's great for O(1) lookups, but the code I have planned out has little need for fast random access and hash tables don't make good use of CPU caches. If I run into performance issues, I can see if a hash fixes it, but for now I'll stick to this. I think the larger concern is that a particle system might require more efficient random deletion.

If only life was so simple...

Since my entity ID's monotonically increase (omitting int32 overflow), keeping the component vectors sorted is generally simple enough. I used std::lower_bound() to figure out where to insert new elements, just in case, but generally entities don't gain or lose components over time so they're almost always inserted at the end. The component arrays are essentially parallel where std::get<std::vector<ComponentData<X>>>[i] would resolve to the i'th X component of probably the i'th entity, and ditto for Y. However, not all entities have all components so some checking might be needed.

Say I want an algorithm that does this:

for (grid_pos, graphical_pos) in ecs.read_grid_and_graphical_pos() {

graphical_pos = grid_pos * TILE_SIZE

}

Well, if I can iterate through one, I can do two, right? Start at the beginning, make sure they're pointing at the same element, and in the increment step, increment one of them and then the other until it has the same EntityId or higher and keep going back and forth until either hits the end or they both have the same ID again.

Great! Simple! Problem: I have potentially N iterators and they're all of different types. I want this syntax:

int sum = 0;
for (auto [id, i, u] : ecs.read_all<int, unsigned>()) sum += i + u,

Obviously, this is non-trivial with tuples since we can't store iterators to each container in a vector, we have to store them in yet another tuple.

My first idea was to have a range class parameterized by the containers (std::vector<ComponentData<T>>...) and then for it to define nested iterator classes parameterized by the iterator types of that container. The basic issue with this is that the container could be const or non-const, but I had to be able to differentiate since in one case, the iterator class would store std::vector<T>::iterator and in the other, ::const_iterator so I ended up writing it twice. But what I landed with is so much better.

template<typename F, typename...T>
constexpr auto tuple_map(F&& f, const std::tuple<T...>& t) {
  return std::tuple(f(std::get<T>(t))...);
}

template<typename F, typename...T>
auto tuple_map(F&& f, std::tuple<T...>& t) {
  return std::tuple(f(std::get<T>(t))...);
}

// A range abstraction that allows multiple component data series to be iterated
// lazily over in a range-based for loop.
template<typename StoreTuple>
class ComponentRange {

  // A tuple of references to component storage.
  StoreTuple stores_;

protected:
  template<typename IteratorTuple>
  struct Iterator {
    // ,,,
    IteratorTuple its;
    IteratorTuple ends;

    bool sentinel = false;

    // To know if all iterators are pointing to the same entity, we'll neet
    // to remember what entity that is.
    EntityId max_id;

    // ...
    Iterator(IteratorTuple its, IteratorTuple ends)
      : its(std::move(its)), ends(std::move(ends)) {
        // ...
      }

    Iterator() { sentinel = true; }
  };

public:
  explicit ComponentRange(StoreTuple stores)
    : stores_(stores) { }

  auto begin() const {
    auto b = [](auto&& store) { return store.begin(); };
    auto e = [](auto&& store) { return store.end(); };
    return Iterator(tuple_map(b, stores_), tuple_map(e, stores_));
  }

  auto end() const {
    return decltype(begin())();
  }
};

The general idea is to be as unconstraining of types as possible so where ever we can get away with not specifying a type, we do. StoreTuple will generally resolve to something ugly like...

std::tuple<std::vector<ComponentData<X>>&, std::vector<ComponentData<Y>>&, ...>

tuple_map and tuple_foreach (the same function, it just doesn't return a new tuple) are going to be the bread and butter of this class. For reference about how difficult these functions were to write in C++11/14, I implemented them way back in "Fun With Tuples".

If any of the its is at the end, the whole tuples is, which made comparing to the end a bit tricky, thus a sentinel is used. I think a variation on this idea could have been to consider two Iterators equal if any of their its are equal, though.

We can't iterate through the tuples easily so it's best we work on them uniformly, but we don't want any advancing past the max_id, which is going to be the highest ID of an entity of all iterators. Just iterating once is easy.

    template<typename It>
    bool at_end(It it) const {
      return it == std::get<It>(ends);
    }
    // Note that the alternative to the above is
    // std::get<std::decay_t<decltype(it)>>(ends) if auto-typed (e.g. lambda).

    template<typename Iter>
    void increment_iter(Iter& it) {
      if (!at_end(it)) ++it;
      if (!at_end(it)) max_id = std::max(it->id, max_id);
    }

This handles the case where an iterator was under max_id, but in an attempt to catch up it went over and set a new maximum.

Next, we have to handle incrementing all the iterators until they all line up on one ID.

    // A helper to catch up iterators that are behind.
    void increment_if_lower_than_max_id() {
      auto impl = [&](auto& it) {
        if (!at_end(it) && it->id < max_id) increment_iter(it);
      };
      tuple_foreach(impl, its);

    bool any_at_end() const {
      auto any = [](auto...args) { return (args || ...); };
      auto pred = [this](const auto& it) { return at_end(it); };
      return std::apply(any, tuple_map(pred, its));
    }

    // True if all iterators point to the same ID.
    bool all_same() const {
      auto all = [](auto...args) { return (args && ...); };
      auto equals_max_id =
        [this](const auto& it) { return it->id == max_id; };
      return std::apply(all, tuple_map(equals_max_id, its));
    }

    // After iteration, or on initialization, increment any iterators that
    // are behind.
    void catch_up() {
      while (!any_at_end() && !all_same()) increment_if_lower_than_max_id();
      if (any_at_end()) its = ends;
    }

    Iterator& operator++() {
      // Increment at least once.
      tuple_foreach([this](auto& it) { increment_iter(it); }, its);
      catch_up();

      return *this;
    }

    Iterator operator++(int) {
      Iterator old = *this;
      ++(*this);
      return old;
    }

And finally, comparisons.

    bool operator==(const Iterator& other) const {
      return sentinel ? other == *this : any_at_end();
    }
    bool operator!=(const Iterator& other) const {
      return !(*this == other);
    }

    auto operator*() const {
      auto data = [](const auto& it) -> auto& { return it->data; };
      return std::tuple_cat(std::tuple(max_id), tuple_map(data, its));
    }

The final gotcha is that on construction, not all the iterators might be pointing to the same ID, so we have to call catch_up() immediately. The full code for the constructor looks like this:

    Iterator(IteratorTuple its, IteratorTuple ends)
      : its(std::move(its)), ends(std::move(ends)) {
        max_id = EntityId{0};
        tuple_foreach(
            [&](auto it) {
              if (!at_end(it)) max_id = std::max(max_id, it->id);
            }, its);
        catch_up();  // Ensure a valid initial starting position.
      }

    Iterator() { sentinel = true; }

Conclusions

The full code can be found here: https://gist.github.com/splinterofchaos/a67426bb95ea97b3f33fbc44406bc791

The implementation is incomplete at the time of this writing and some of the interface is inconsistent or unnecessary, but it's a start.

And for the project in which I'm using it: https://github.com/splinterofchaos/py-srpg (Originally written in Python; the C++ conversion is on the cpp branch at the time of this writing.)

Writing code like this used to be tedious, difficult, and involve heaps of boilerplate, but at this point, fold expressions over a variable number of types and arguments feels easier than a for loop from i=0 to N.

The biggest frustrations for functional programming in C++ remain being things like how in order to take a template function and convert it to a callable function object, one has to wrap in a lambda and there exists weak partial application. For example, in our begin() function.

  auto begin() const {
    auto b = [](auto&& store) { return store.begin(); };
    auto e = [](auto&& store) { return store.end(); };
    return Iterator(tuple_map(b, stores_), tuple_map(e, stores_));
  }

But things like fold expressions and class template argument deduction make things absolutely much easier.

As for the iterator class, the logic is much less trivial than I'd hope an iterator class to be, but I think the performance should be decent. If my program ends up spending more time incrementing iterators than using the values obtained, I think my bigger issue will be that my game's not doing much. Most of the iterations through the ECS should be of very similar entities and so in the general case, it should increment once and exit.

As for the entity class itself, I'll have to test it out a bunch. Obviously, I worry that adding or removing large numbers of entities could be problematic, but I imagine it'll be good enough for my purposes and there's plenty of room for optimization, like maintaining a free list of deleted entity IDs and component data, for one example. Still, when I'd used the same in Python, it worked well so it should be better here.

SFINAE std::result_of? Yeah, right!

2015-02-19T09:09:00.002-08:00

Return type deduction pre-decltype and auto could really make one mad. Back then, if you wanted a function object, you had to make it "adaptable", meaning it had to inherit from std::unary_ or binary_function and define its first_argument_type, second_argument_type, and result_type. There had been no concept of "transparent functors" allowing us to pass polymorphic functions to higher order functions (like std::plus<> to std::accumulate). For an example of programming in these dark ages, check out the FC++ FAQ.

Just to recap, SFINAE stands for Substitution Failure Is Not An Error. It's the rule that lets you define two overloads of a function such that and one returns decltype(f(x)) and the other decltype(g(x)), if f(x) can't be evaluated, that's not an error. If g(x) can't. either, that's OK too, as long one one of the two can be for any given x.

And so came decltype and it let us do never-before seen, amazing things, right? Well, yes, we could write things like this:

template<class X>
constexpr auto append_blah(X& x) -> decltype(x + "blah")
{
  return x + "blah";
}

int main() {
  append_blah("nah");
}

Obviously, since "nah" += "blah" isn't valid, this will fail to compile, but we'll get a nice, easy to read error message, right? Nah!

auto.cpp: In function `int main()`:
auto.cpp:11:20: error: no matching function for call to `append_blah(const char [4])`
   append_blah("nah");
                    ^
auto.cpp:11:20: note: candidate is:
auto.cpp:5:16: note: template<class X> constexpr decltype ((x + "blah")) append_blah(X)
 constexpr auto append_blah(X x) -> decltype(x + "blah")
                ^
auto.cpp:5:16: note:   template argument deduction/substitution failed:
auto.cpp: In substitution of `template<class X> constexpr decltype ((x + "blah")) append_blah(X) [with X = const char*]`:
auto.cpp:11:20:   required from here
auto.cpp:5:47: error: invalid operands of types `const char*` and `const char [5]` to binary `operator+

That's GCC's output--more error than code! Even with the most relevant section bolded, it's a mess. Clang does a better job, though.

auto.cpp:12:3: error: no matching function for call to 'append_blah'
  append_blah("nah");
  ^~~~~~~~~~~
auto.cpp:6:16: note: candidate template ignored: substitution failure [with X = char const[4]]: invalid operands to binary expression ('char const[4]' and 'const char *')
constexpr auto append_blah(X& x) -> decltype(x + "blah")
               ^                               ~~

So now, thanks to N3436, that the standard dictates that std::result_of be SFINAE, too. Has the situation for GCC improved? First, the new source:

template<class X>
constexpr std::result_of_t<std::plus<>(X, const char*)>
append_blah(X x) {
  return std::plus<>{}(x, "blah");
}

int main() {
  append_blah("nah");
}

And GCC's output:

auto.cpp: In function `int main()`:
auto.cpp:12:20: error: no matching function for call to `append_blah(const char [4])`
   append_blah("nah");
                    ^
auto.cpp:12:20: note: candidate is:
auto.cpp:5:16: note: template<class X> constexpr std::result_of_t<std::plus<void>(X, const char*)> append_blah(X)
 constexpr auto append_blah(X x)
                ^
auto.cpp:5:16: note:   template argument deduction/substitution failed:

While the error message certainly gotten shorter, GCC no longer tells us why the function failed to compile, only that it was a candidate and "template argument deduction/substitution failed"--it doesn't even give us the type of X! SFINAE doesn't mean better diagnostics, it's just an overloading technique. Still, if it decreases the readability of error messages, then having many overloads, all failing, just compounds the problem.

But clang does better, right?

auto.cpp:12:3: error: no matching function for call to 'append_blah'
  append_blah("nah");
  ^~~~~~~~~~~
auto.cpp:5:16: note: candidate template ignored: 
  substitution failure [with X = const char *]: 
    no type named 'type' in `std::result_of<std::plus<void> (const char *, const char *)>`
constexpr auto append_blah(X x)
               ^
1 error generated.

Better, yes, but not great. We still don't know why it failed. If you have deeply nested function calls, but your top-level function guarantees a return of std::result_of_t<invalid-expression>, you'll never know why it failed to compile because result_of doesn't actually evaluate the semantics; it just checks the signatures and spits out that result_of has no type.

There simply must be a way of writing this such that GCC and clang will give a minimal, correct diagnosis of the problem, right? Well, let's try using auto as our return type.

template<class X>
constexpr auto append_blah(X x) {
  return x + "blah";
}

int main() {
  append_blah("nah");
}

What does GCC have to say?

auto.cpp: In instantiation of `constexpr auto append_blah(X) [with X = const char*]`:
auto.cpp:11:20:   required from here
auto.cpp:7:12: error: invalid operands of types `const char*` and `const char [5]` to binary `operator+`
   return x + "blah";
            ^

Wow, straight to the point and totally accurate! We get a nice, minimal trail of what GCC had to instantiate to find the error, but not a hard-to-follow mess of output. How does clang handle it?

auto.cpp:7:12: error: invalid operands to binary expression (`const char *` and `const char *`)
  return x + "blah";
         ~ ^ ~~~~~~
auto.cpp:11:3: note: in instantiation of function template specialization `append_blah<const char *>` requested here
  append_blah("nah");
  ^
1 error generated.

Again, exactly what I wanted!

So in terms of getting good and useful error messages, auto or decltype(auto) gives the best, followed by decltype, and result_of gives the absolute worst. The problem is that auto can't participate in SFINAE, so you can't overload a set of functions on auto and that leaves you with std::enable_if, std::result_of, or decltype.

Before concluding, let's look at a slightly more obfuscated example.

#include <functional>

constexpr struct plus_f {
  template<class X, class Y>
  constexpr auto operator() (X&& x, Y&& y) const
    -> decltype(std::declval<X>() + std::declval<Y>())
  {
    return std::forward<X>(x) + std::forward<Y>(y);
  }
} plus{};

template<class X>
constexpr auto append_blah(X x) -> decltype(plus(x, "blah")) {
  return plus(x, "blah");
}

int main() {
  append_blah("nah");
}

Now, in order to find the error, the compiler has to go just two levels deep, through append_blah, then to plus_f, and evaluage x + "blah". But will the error message go that deep? We know that if we used std::result_of, it would just say that result_of<plus_f(X, const char*)> has no 'type'. But this is decltype!

First, GCC:

auto.cpp: In function `int main()`:
auto.cpp:19:20: error: no matching function for call to `append_blah(const char [4])`
   append_blah("nah");
                    ^
auto.cpp:19:20: note: candidate is:
auto.cpp:14:16: note: template<class X> constexpr decltype (plus(x, "blah")) append_blah(X)
 constexpr auto append_blah(X x) -> decltype(plus(x, "blah")) {
                ^
auto.cpp:14:16: note:   template argument deduction/substitution failed:
auto.cpp: In substitution of `template<class X> constexpr decltype (plus(x, "blah")) append_blah(X) [with X = const char*]`:
auto.cpp:19:20:   required from here
auto.cpp:14:59: error: no match for call to `(const plus_f) (const char*&, const char [5])`
 constexpr auto append_blah(X x) -> decltype(plus(x, "blah")) {
                                                           ^
auto.cpp:4:18: note: candidate is:
 constexpr struct plus_f {
                  ^
auto.cpp:6:18: note: template<class X, class Y> constexpr decltype ((declval<X>() + declval<Y>())) plus_f::operator()(X&&, Y&&) const
   constexpr auto operator() (X&& x, Y&& y) const
                  ^
auto.cpp:6:18: note:   template argument deduction/substitution failed:
auto.cpp: In substitution of `template<class X, class Y> constexpr decltype ((declval<X>() + declval<Y>())) plus_f::operator()(X&&, Y&&) const [with X = const char*&; Y = const char (&)[5]]`:
auto.cpp:14:59:   required by substitution of `template<class X> constexpr decltype (plus(x, "blah")) append_blah(X) [with X = const char*]`
auto.cpp:19:20:   required from here
auto.cpp:7:35: error: invalid operands of types `const char*` and `const char [5]` to binary `operator+`
     -> decltype(std::declval<X>() + std::declval<Y>())
                                   ^

GCC feels obligated to tell us why, so in terms of having an accurate, if not in an overly-verbose, error messages, it does its job. It'll take some digging, but a user will eventually be able to find out why it failed. But curiously, clang does no better than std::result_of.

auto.cpp:19:3: error: no matching function for call to `append_blah`
  append_blah("nah");
  ^~~~~~~~~~~
auto.cpp:14:16: note: candidate template ignored:
  substitution failure [with X = const char *]:
    no matching function for call to object of type `const struct plus_f`
constexpr auto append_blah(X x) -> decltype(plus(x, "blah")) {
               ^                            ~~~~
1 error generated.

This doesn't even tell us the types of the arguments to plus_f that caused the failure! If plus_f were a more complicated function with a slight semantic error, we'd be left clueless.

Just for comparison, here's the output with GCC, using auto:

auto.cpp: In instantiation of `constexpr auto plus_f::operator()(X&&, Y&&) const [with X = const char*&; Y = const char (&)[5]]`:
auto.cpp:14:24:   required from `constexpr auto append_blah(X) [with X = const char*]`
auto.cpp:18:20:   required from here
auto.cpp:8:31: error: invalid operands of types `const char*` and `const char [5]` to binary `operator+`
     return std::forward<X>(x) + std::forward<Y>(y);
                               ^

So, when it comes to decltype vs. std::result_of, neither really produced ideal diagnostics and both can be rather unhelpful. decltype at least gives us a little more information, but only with GCC.

I discovered this while working on FU. One of my basic functions that I based the library off of had a very simple bug. Not realizing this, I wrote a small function that called another that called another that eventually landed in that one. When I tried to compile, I got several pages of errors, and they kept suggesting that the problem was in one of the mid-level functions. Assuming I might have gotten the result_of expression wrong, I decided to just use auto and let the compiler figure it out. Then, GCC and clang both told me that the next function in the call tree was the culprit. I continued to mark function by function as returning auto until I found the problem. I had tried to pass an rvalue reference to a function taking a non-const reference. After making it pass by const reference, everything else worked.

Does this mean that we should always prefer auto to result_of or decltype when we don't need SFINAE? I think it should be evaluated on a case-to-case basis. In terms of documentation, auto gives the user a pretty muddy perception on what the function will return. For something like std::plus<>, that's not a problem because addition generally doesn't change the type of its arguments too much. For something simple like std::invoke (C++17), std::result_of_t<F&&(X&&...)> gives us the clearest documentation about how it works--it perfect forwards F and the arguments, X...--but this gives us the worst diagnosis if F can't be called with the arguments. decltype would give the best error message (with clang when the error is shallow, otherwise only with GCC), but decltype(std::forward<F>(f)(std::forward<X>(x)...)) doesn't read as nicely as result_of<F&&(X&&...)>.

Whether to use auto, result_of, or decltype is a matter of priorities, weighing the importance of documentation (result_of; maybe enable_if) against semantics (decltype) and diagnostics (auto). For FU, a library that depends heavily on dependent typing, I will probably over time switch entirely to using auto because determining the cause of errors is of great importance for making it usable. Users will likely be confused with messages like "result_of<multary_n_f<1,lassoc_f>(UserFunc, TypeA, TypeB, TypeC)> has no member, `type`". At the same time, this feels like poor style--an unfortunate dilemma. With any hope, GCC and clang will improve how they propagate errors over time.

For information on how to improve the diagnostics of functions using decltype, check out pfultz2's article, "Improving error messages in C++ by transporting substitution failures".

Common algorithm patterns.

2015-02-17T10:45:00.000-08:00

Of the STL, <algorithm> may be the most well-used non-container library, but often requires a level of verbosity that requires just as much typing as the hand-written loop, making it not always feel so convenient. It benefits code that uses it with increased clarity and mathematical soundness, so reducing the syntactic overhead should be a goal of those using it. Today I will talk about these problems and demonstrate ways of making it more terse and sound.

Usually, I try to put all the code in my articles in one gist, but this time I will base it off my most recent work--a library called FU (Functional Utilities)--for sake of simplicity. Still, it applies as well to fit, FTL, and to some degree, range-v3, if you use any of those.

Projection

We often store large data structures for use in many areas, but a specific algorithm may only need one or two members. Say I have a list of people and I want to sort them in order of their year_of_birth. I'm sure anyone with much experience with C++ has seen code like this:

std::sort(std::begin(people), std::end(people),
          [](const Person& a, const Person& b) {
            return a.year_of_birth < b.year_of_birth;
          });

This pattern occurs so frequently than range-v3 accepts "projection" functions for algorithms such as this.

range::v3::sort(people, std::less<>{}, &Person::year_of_birth);

(see: https://github.com/ericniebler/range-v3/blob/master/include/range/v3/algorithm/sort.hpp)

The projection converts its argument to make it suitable for the comparison function, std::less in this example. Internally, it ends up calculating the same thing as the lambda, extracting year_of_birth from both arguments and calling less on them.

One might ask, "but &Person::year_of_birth is not a function, so how can we pass it as one?" Internally, range-v3 and FU use a more generic concept of "invoking", which does not limit them to regular functions. See n3727, or std::invoke (C++17) for more.

This issue is not limited to <algorithm> or std::sort, it is a generic problem, so I wrote fu::proj for use with the current STL and elsewhere.

std::sort(std::begin(people), std::end(people),
          fu::proj(std::less<>{}, &Person::year_of_birth));

But since requiring a projection over less is so very common, fu::proj_less can be more convenient.

std::sort(std::begin(people), std::end(people),
          fu::proj_less(&Person::year_of_birth));

Fit also provides a similar utility, by, although the syntax is a bit different.

std::sort(std::begin(people), std::end(people),
          fit::by(std::mem_fn(&Person::year_of_birth), _ < _));

I argue that each of the four versions above would be more clear and less error prone than the first, using a lambda. Rather than specifying exactly how to apply the arguments, we need only specify what to apply.

So why not use std::bind for examples like this?

using namespace std::placeholders;
std::sort(std::begin(people), std::end(people),
          std::bind(std::less<>{},
                    std::bind(&Person::year_of_birth, _1),
                    std::bind(&Person::year_of_birth, _2)));

With examples like that, it's no wonder people prefer lambdas! It actually becomes less readable with the overly-verbose and overly-general std::bind.

Projection works well enough for comparisons, but then you have a function like std::accumulate and you only want to project the right-hand argument. Consider calculating the average age of the list of people.

auto acc = [](int accum, const Person& p) { return accum + p.age(); }
int sum = 
  std::accumulate(std::begin(people), std::end(people), 0, acc);
float ave = float(sum) / people.size();

Again, range-v3 allows projection here:

int sum = range::v3::accumulate(people, std::plus<>{}, &Person::age);
float ave = float(sum) / people.size();

One can use fu::rproj (right-projection) with the existing STL.

int sum = 
  std::accumulate(std::begin(people), std::end(people),
                  0, fu::rproj(std::plus<>{}, &Person::age));
float ave = float(sum) / people.size();

While proj and rproj will be sufficient for most <algorithm> functions, we do have one special case: std::transform or std::equal. Here, the right-hand and left-hand arguments of our function may be different types, but we still might want to use something like std::plus or std::equal_to to combine them.

For lack of a better example, consider that my struct, Person, contains a member, year_of_death. I want a sanity check that for any person, year_of_birth < year_of_death.

bool ok = std::equal(std::begin(people), std::end(people),
                     std::begin(people),
                     [](const Person& a, const Person& b) {
                       return a.time_of_birth < b.time_of_death;
                     });

With range-v3:

bool ok = range::v3::equal(people, people, std::less<>{},
                           &Person::year_of_birth, &Person::year_of_death)

For fu, I decided to call this function where the right- and left-hand arguments require different projections join instead of binary_proj or bproj.

auto pred = fu::join(std::less<>{}, &Person::year_of_birth,
                                    &Person::year_of_death);
bool ok = std::equal(std::begin(people), std::end(people),
                     std::begin(people), pred);

Although, a more obvious solution might be to use std::all_of, using fu::split(f, left, right), which produces a function, s, such that s(x) == f(left(x), right(x)).

bool ok = std::all_of(std::begin(people), std::end(people),
                      fu::split(std::less<>{}, &Person::year_of_birth,
                                               &Person::year_of_death));

So, in this instance, the lambda might actually win out on terseness, but projection does one nice thing for us: it asserts mathematical soundness. It might be very tempting to write an equality operator between a Person and an std::string to compare the person's name, especially for use with <algorithm>s, but it makes no real sense. I also store the person's birthplace as a string, so should person == "place" be valid, too? No, it makes much more sense to define a predicate based on a function with well-known mathematical properties, like std::equal_to, and use projection to obtain the items for which equality has already been defined.

Haskell programmers like to say "if it compiles, it works!" Some of that has to do with the idea of "referential transparency", but another large part of it relates to using mathematically sound operations to assert correctness. If each component, or function, is obviously correct, and the algorithm is made of obviously correct operations, then most likely the algorithm is obviously correct.

Let's use this concept to prove the correctness of our first example, sorting by the projection of std::less on the projection of &Person::year_of_birth. It may seem unnecessary, but the rigors of mathematics require that we can formally prove these things, not just intuit or know them.

We can assume the correctness of std::sort by definition, if the comparison function implements a strict weak order. Strict weak order for any comparison, comp, means that it must be irreflexive (!comp(x,x)), weakly ordered (!comp(a, b) && !comp(b, a) implies a == b), and transitive (comp(a,b) && comp(b,c) => comp(a,c)).
We know by definition that std::less meets this requirement thus proj_less(f) meets this requirement if f(a) < f(b) preserves this property (also by definition).
Finally, we know that f(a) == a.year_of_birth, stored as an integer, and that a set of integers sorted by < are strict-weak ordered.

Thus, we can call that example "obviously correct." In fact, we can further state that any ordering over proj_less(f) will be correct with very few, if any, exceptions.

Mathematical soundness and obvious correctness make it easy to reason about code, but projection alone does not give us the required tools to accomplish it. We need other fundamental tools, as I will show in the next section.

The building blocks.

Say I want to construct a list of people born in Germany. Like always, we can start with the generic lambda.

std::vector<Person> germans;
std::copy_if(std::begin(people), std::end(people),
             std::back_inserter(germans),
             [](const Person& p) { return p.birth_place == "Germany"; });

Range-v3 does allow a projection for copy_if, but unfortunately, that won't much help us.

range::v3::copy_if(people, std::back_inserter(germans),
                   [](const std::string& place) { return place == "Germany"; },
                   &Person::birth_place);

It would have been less typing without the projection, although we at least know that the lambda and projection both are obviously correct and thus is the expression. To really get the desired level of terseness, we need a partial application of equality.

FU supplies a function, part, so that we could write "auto pred = fu::part(std::equal_to<>{}, "Germany")", and then one could write "pred("Hungary")", which would return false, or "pred("Germany")", true, but it's just not terse. Since one often needs to pass binary operators or partially-applied operators to functions like copy_if, FU also supplies a number of such functions. For this example, we need fu::eq, which acts similarly to std::equal_to except that fu::eq(x) returns the partial application. (But fu::eq(x,y) does compute x == y, like one would expect.) So with that, we can return to our example:

range::v3::copy_if(people, std::back_inserter(germans),
                   fu::eq("Germany"), &Person::birth_place);

Fit's placeholders utilities also recognize the importance of partial applications like this.

range::v3::copy_if(people, std::back_inserter(germans),
                   _ == "Germany", &Person::birth_place);

And with the STL using FU,

std::copy_if(std::begin(people), std::end(people),
             std::back_inserter(germans),
             fu::proj(fu::eq("Germany"), &Person::birth_place));

(Note: fu::ucompose, short for "unary composition", might be more appropriate here, but proj(f,g)(x) is roughly equivalent to ucompose(f,g)(x).)

Let's say I need all the names stored in people in a contiguous string, separated by commas. I could just call std::accumulate, starting with an empty std::string and append each one using rproj(std::plus<>{}, &Person::name), but I want to minimize the number of required allocations so I have to sum up their sizes first. It'll look something like this, with range-v3:

auto name_len = [](const Person& p) { return p.name.size(); };
size_t size = range::v3::accumulate(people, 0, std::plus<>{}, name_len);
                                                                         
std::string names;
names.reserve(size + people.size() * 2);
                                                                         
auto append = [&](const std::string& str) { names.append(str);
                                          names.append(", "); };
range::v3::for_each(people, append, &Person::name);

(For simplicity, I'm ignoring that this algorithm will append an extra ", " at the end.)

Again, we could not represent the mini-functions name_len and append as simple projections and had to resort to writing out the lambdas. However, can we represent them using fu::proj? Well, for name_len, yes.

auto name_len = fu::proj(&std::string::size, &Person::name);

Invoking &Person::name returns an std::string, and &std::string::size gives us the desired result--it's obviously correct. append, however... no, and for two reasons. Firstly, std::string::append is an overloaded function so we can't just take its address; we'd need to wrap it in a lambda, anyway. Secondly, we also need to append ", ", and while one could write a series of generic functional utilities and define append as accordingly, we'd most likely lose our "obvious correctness" with no real gains. For completeness, here is the version for the STL:

auto name_len = fu::proj(&std::string::size, &Person::name);
size_t size = std::accumulate(std::begin(people), std::end(people),
                              0, fu::rproj(std::plus<>{}, name_len));
                                                                    
std::string names;
names.reserve(size + people.size() * 2);
                                                                    
auto append = [&](const std::string& str) { names.append(str);
                                          names.append(", "); };
std::for_each(std::begin(people), std::end(people),
              fu::proj(append, &Person::name));

Conclusions

One might wonder why, in the last example, I prefer to make append operator on just an std::string and insist on projecting over &Person::name. I can trivially prove the correctness of append(str); it appends str followed by ", " to names, and it might even by useful in another part of the program. proj(append, &Person::name) is obviously correct for appending a person's name. Each unit, append and &Person::name operates on one level of abstraction; proj links them together. If the way a Person holds its name changes, I only need to change &Person::name and append remains valid. If I need to change append, the projection still remains valid.

Keeping our functions small and on one layer of abstraction makes the program more robust, aside from maintaining soundness of the algorithm. FU's proj, rproj, part, and split can be used to construct most simple predicates, relational functions, and accumulators, and can also be combined in simple, granular expressions to make more complex ones.

Links

range-v3: https://github.com/ericniebler/range-v3

Fit: https://github.com/pfultz2/Fit/

FU: https://github.com/splinterofchaos/fu

Gist (requires FU): https://gist.github.com/splinterofchaos/e867c65e3ed5fe1b41da

Notes:

fu::proj_less: proj_less has a very simple definition: proj(std::less<>{}), although it actually uses a helper so that it may be constexpr. Most fu functions are objects that can be implicitly partially applied. In fact, the definition of proj is actually closer to [](auto f, auto pf, auto x, auto y) { return f(pf(x), pf(y)); }, but perfect forwarding. Thus, proj(std::less<>{}, &Person::name) is also a partial application.

fu::rproj(std::plus<>): I considered adding a function to FU called accumulator = rproj(std::plus<>{}), but it didn't seem very intuitive. Please comment below if you have an opinion on that.

range::v3::copy_if: It may be preferable to use range::v3::view::remove_if(rng, std::not_fn(pred)) for the example given. One might want to use range::v3::view::filter, but it has been deprecated.

fit::by: We require std::mem_fn for Fit because it does not use a generalized invoke function.

Rvalue references and function overloading.

2015-01-08T05:33:00.000-08:00

No matter how many articles get written about rvalue references, I still get the impression that many people don't understand them fully. Some belief that rvalues work differently as template parameters than they do in non-template contexts, which requires special rules in overload resolution. I intend to show how rvalues would consistently with the rest of the language. In the coarse of writing this article, I even straightened out my own misconceptions of rvalues.

I assume that the reader has some level of familiarity with rvalue references; understanding perfect forwarding helps. For either an introduction or refresher, I recommend "C++ Rvalue References Explained", a very thorough introduction and a well indexed reference.

Today, I want to break out of my normal pedagogical style into a Q&A. I will ask that the reader thinks for a moment about what they expect a code example to do, and why.

Non-template rvalue overloading.

Typically, discussion of rvalues is limited to template functions and move constructors, but consider the following code example:

void f(int) { } void f(int&) { } void f(int&&) { } int main() { f(0); // (1) int x; f(x); // (2) }

What overload(s) of f get chosen at (1) and (2)? Please take a moment to compare (1) and (2) to each overload before continuing.

(1) is ambiguous because it can refer to f(int) and f(int&&), while (2) is ambiguous because it can be f(int) or f(int&).

The C++ standard describes a sort of "best fit" algorithm for overload resolution by first determining the viable overloads, then ranking them based on the conversions required of the arguments being passed. The highest rank, considered an "exact match," includes identity (or: "no conversion required") and considers reference binding a member of that category. Because of that, f(int), has the same rank as f(int&) and f(int&&) in overloading.
struct T { T() { } T(const T&) { } T(T&&) { } }; void f(T&) { } void f(const T&) { } void f(T&&) { } int main() { T t; T t2(t); // (1) T t3(T{}); // (2) f(t); // (3) f(T{}); // (4) }

Does this example suffer from the same ambiguity? Why or why not?

Being familiar with copy and move constructors, we know that (1) will invoke T(const T&) and that (2) will invoke T(T&&), but shouldn't there be an ambiguity because we can bind temporaries to const references? Every overload has the "exact match" rank, but the standard also defines a sub-ranking for "implicit conversion sequences", which covers reference parameters (including rvalue references) and "qualification adjustment" such as converting an int& to const int&. Given two overloads on reference parameters, the rvalue reference overload is prefered to the lvalue one, thus T(T&&) is chosen for (1) and f(T&&) for (4). (3) has a split between f(T&) and f(const T&), but the former wins because the argument, t, is not const and would require adjustment.

Note that f would be ambiguous if we had defined an f(T) overload because every other overload would have the same relative ranking. While f(T&) matches more precisely than f(const T&), they both have the same rank compared to f(T).

Before considering the impact of rvalues on template functions, let's ponder what we already know about perfect forwarding.
template<typename T> void f(T&&) { }

We know that this function can be called on any value and might refer to it as a "universal reference", although it has more recently been redubbed "forwarding reference". Let's try to reproduce this behaviour without templates.

void f(int&&) { } void f(int& &&) { } void f(const int& &&) { }

This code is not valid. I usually like clang's diagnostics the best, but here it complains, rather mundanely, "type name declared as a reference to a reference". Gcc gives us a very nice hint about this code: "cannot declare reference to ‘int&’, which is not a typedef or a template type argument".

using ref = int&; using cref = const int&; void f(int&&) { } void f(ref&&) { } void f(cref&&) { } int main() { int x = 0; const int y = 0; f(0); // calls f(int&&) f(x); // calls f(ref&&) f(y); // calls f(cref&&) }

Neither clang nor gcc has any difficulty with this version. The above set of overloads looks fairly similar to the familiar perfect-forwarding pattern.

using rval = int&&; using rrval = rval&&; // (1) using rrrval = rrval&&; // (2) void f(rval) { } void f(rrval) { } // (3) void f(rrrval) { } // (4)

For (1) and (2), are they valid? For (3) and (4), what values would these overloads accept?

You may be surprised, but (1) and (2) are completely valid, however (3) and (4) are not. Again, the diagnostics of gcc and clang give us some good insight. They complain that f(rrval) redefines f(rval), and ditto for f(rrrval). That is to say: rval, rrval, and rrrval all refer to the same type! The same works for normal references--(int&) & is the same as int&.

While int and int&& are distinct types, the standard states that there "shall be no references to references," which includes rvalue references to references. This is one context where non-rvalues always take precedence.

using ref = int&; using rval = int&&; using T1 = ref&&; // T1 = int& using T2 = rval&; // T2 = int&

This explains why perfect forwarding works the way it does. You see T&&, but it actually decays to the exact type of the parameter.

Rvalues in template functions:

One might consider rvalue references in the context of template functions as forwarding references, previously called universal references, because they seem to apply to every situation.

template<typename T> void f(T) { } template<typename T> void f(T&) { } template<typename T> void f(T&&) { } int main() { int x = 0; f(0); // (1) f(x); // (2) }

Which overload of f gets called at (1) and (2)?

As a rule of thumb, one might assume that f(T&&) gets chosen for both because it can be, (with T = int and T = int&, respectively) but actually the very same rules apply as the non-template version. (1) is ambiguous between f(T) or f(T&&) and (2) is ambiguous between all three overloads (again, because they all fit into the "exact match" rank).

template<typename T> void f(T&&) { }

void f(const int&) { } int main() { int x = 0; f(x); }

Which overload of f gets called?

This time, f(T&&) gets instantiated with T = int& and f(const int&) requires requires a qualification adjustment from int& to const int&, so f(T&&) gets chosen.

template<typename T> struct X { }; template<typename T> void f(T&&) { } template<typename T> void f(X<T&&>&) { } int main() { X<int> x; X<int&> xref; f(x); // (1) f(xref); // (2) }

Which overload of f gets chosen at (1) and (2)?

It's f(T&&) for both. Sure, T&& can bind to anything, but the T in X<T&&> can only bind to actual rvalue references. They compiler does not try to fit T&& with int or int&, but looks at the whole type, X<T&&> vs X<int> and X<int&> and cannot deduce T. Ironically, if we wanted a signature that would bind to X of anything, it would have to be X<T>.

Conclusions

Many seem to believe that template rvalue references are inconsistent with normal overload resolution in C++ and find referring to rvalues in this context as "universal" or "forwarding" references as helpful in order to reason about why they work differently. I hope that through the above examples, the reader has learned something about rvalue references and the consistency of the rules surrounding them.

The Python API and C++

2014-11-26T13:19:00.001-08:00

Recently, for a job interview task, I was asked to write a Python module with a C or C++ implementation to solve an otherwise simple task. Obviously, I chose C++. While I had never used the Python API before, I found that the existing information on extending Python with C quite sufficient. What surprised me, however, is how little information existed for using C++. A few libraries exist, like Boost.Python, PyCXX, and some utilities that parse C++ to create Python bindings, but I didn't find much in the way of actual information without examining the sources of these libraries.

I will not discuss much why someone would want to implement a Python module in another language (efficiency? better library support for certain tasks? preference?), but why C++? The Python API has basically no type safety--everying is a PyObject *, whether it represents a string, number, or tuple. It requires a considerable amount of boiler-plating--something we can reduce by using the C++ type system. It presents some interesting technical challenges, which are what I will focus on. I will assume some knowledge of the Python API.

Note: I will be basing this off Python 2.7. Yes, Python 3 is newer, but due to incompatibilities, not a replacement, and also not my system default. Also, I have little experience with the Python API, so do not take this article as authoritative. It represents a journal of my experiments.

I have started working on a little utility library (https://github.com/splinterofchaos/py-cxx) for personal use, but for a listing of the code for this article, see the gist: https://gist.github.com/splinterofchaos/b099149a701edfa5948f

Writing a Python Module: The Basics

First, we will want to create a Python module, which alone is rather uninteresting. For a more in-depth study, one should refer to https://docs.python.org/2/extending/extending.html.

Every module requires an init function which communicates to the interpreter what functions, types, and objects this module offers. For now, let's consider a module that counts how many time a certain function gets called.

#include <Python.h>

PyObject *count(PyObject *self, PyObject *args)
{
  static int i = 0;
  PySys_WriteStdout("%i\n", ++i);  // Just like printf.
  return PyInt_FromLong(i);
}

static PyMethodDef countMethods[] = {
  {"count", count, METH_VARARGS, "Returns the number of times called."},
  {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC initcount()
{
  PyObject *m = Py_InitModule("count", countMethods);
}

See setup.py for building this example.

Here, countMethods contains the defined functions in a {name, c-function, function-type, __doc__-string} structure. count must be a PyCFunction, a function taking self (probably null) and args (argument tuple) parameters and returning an object. METH_VARARGS lets the interpreter know this is a regular function--other types of functions do exist, but more on that later.

The PyMODINIT_FUNC macro tells Python that, obviously, this function initializes the module. Note that even Py_InitModule() returns a regular Python object!

There are several improvements we can make. First, we could write an overloaded function, to_py_int(), that could dispatch between PyInt_FromLong(), PyInt_FromSsize_t(), and friends, but that's rather mundane so I'll be skipping it. More interesting: we can write a function to define methods.

Aside from METH_VARARGS, we can have a function be a METH_KEYWORDS which takes an additional parameter, a dictionary, and thus not be a PyCFunction, METH_NOARGS, which must still be a PyCFunction, and may receive a self argument, but always NULL for args, or METH_O, which has an object as self. It may be convenient to write a function that takes a pointer to a specific type instead of the generic PyObject, but by casting we lose certain safety guarantees and it can be easy to do something stupid, like writing a function will the wrong number of arguments or the wrong METH_* variant.

#include <type_traits>
template<typename R, typename...X>
constexpr int arity(R(*)(X...)) {
  return sizeof...(X);
}

template<typename R, typename...X>
constexpr bool returns_PyObject(R(*)(X...)) {
  // Result is either a PyObject, or a subclass of one.
  return std::is_convertible<R, PyObject *>::value;
}

template<typename R, typename...X>
constexpr bool is_PyCFunction(R(*)(X...)) {
  return false;
}

template<>
constexpr bool is_PyCFunction(PyCFunction) {
  return true;
}

template<typename F>
constexpr int method_type(F f) {
  return arity(f) == 3     ? METH_KEYWORDS
       : is_PyCFunction(f) ? METH_VARARGS
                           : METH_O;
}

template<typename F>
constexpr PyMethodDef method_def(const char *name, const char *doc,
                                 int type, F f)
{
  static_assert(arity(F()) == 2 || arity(F()) == 3,
                "Methods must have an arity of 2 or 3");
  static_assert(returns_PyObject(F()), "Methods must return a PyObject *.");
  return {name, (PyCFunction)f, type, doc};
}

template<typename F>
constexpr PyMethodDef method_def(const char *name, const char *doc, F f)
{
  return method_def(name, doc, method_type(f), f);
}

static PyMethodDef countMethods[] = {
  method_def("count", "Returns the number of times called.", count),
  {NULL, NULL, 0, NULL}
};

Note that in order to use static_asserts, we construct an F instead of passing f because f, as a parameter, may not be a constexpr.

Now, we can declare methods in a type-safe manor without having to specify METH_* or lose any safety. While it may be a little limiting (for example, we can't use a lambda to define the method), one can always revert to not using method_def as well.

Note: It may be safe to define a function that takes no arguments and cast it to a PyCFunction, however I don't know that this would be true across all architectures and ABI calling conventions.

One thing lacking from this example is actually using the args parameter. For that, we will need to use PyArg_ParseTuple().

A Type-Safe PyArg_ParseTuple().

Let's use the example of finding the cross product of two vectors.

#include <Python.h>

#include "Py.h"  // includes MethodDef()

PyObject *cross(PyObject *self, PyObject *args)
{
  float a, b, c;
  float x, y, z;

  if (!PyArg_ParseTuple(args, "(fff)(fff)", &a, &b, &c, &x, &y, &z))
    return nullptr;

  float i = b*z - c*y;
  float j = c*x - a*z;
  float k = a*y - b*x;

  return Py_BuildValue("fff", i, j, k);
}

static PyMethodDef vecMethods[] = {
  MethodDef("cross", "Returns the cross product of two 3D vectors.", cross),
  {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC initvec()
{
  PyObject *m = Py_InitModule("vec", vecMethods);
}

This lets us write, in Python, cross((a,b,c), (x,y,z)). Even simple functions like this benefit from being written in statically typed languages since, in Python, when one wants to do many operations on some variables, their types must be checked every time, lest you try to add a string to an integer. Here, we do nine operations, but only check the types of the initial six arguments.

PyArg_ParseTuple() is really quite simple; you pass in args and a format string (in this case, using f for float), and pointers to the variables you want to fill. If the tuple doesn't fit the expected format, it sets an error so we can just return NULL. We do our calculation and call Py_BuildValue(), which creates a tuple when given more than one value. Unfortunately, it's very verbose and not type-safe. We can fix that, but first, we must build a format string, preferably at compile time, to pass in.

First, we can use, for convenience, a typedef of std::integer_sequence to build a list of chars.

template<char...cs>
using CharList = std::integer_sequence<char, cs...>;

Then, define mappings for PyArg_ParseTuple.

template<typename...T>
struct CharListConcat;

template<typename T>
struct CharListConcat<T> {
  using type = T;
};

template<typename...U, char...cs, char...cs2>
struct CharListConcat<CharList<cs...>, CharList<cs2...>, U...> {
  using type = typename CharListConcat<CharList<cs..., cs2...>, U...>::type;
};

template<typename...T>
using CharListConcat_t = typename CharListConcat<T...>::type;

template<> struct PTCharListOf<float> {
  using type = CharList<'f'>;
};

template<typename...Ts>
struct PTCharListOf<std::tuple<Ts...>> {
  using type = CharListConcat_t<CharList<'('>,
                                typename PTCharListOf<std::decay_t<Ts>>::type...,
                                CharList<')'>>;
};

template<typename T>
using PTCharListOf_t = typename PTCharListOf<T>::type;

Unfortunately, this strategy is a but limited--we couldn't pass in an std::vector to get the desired affect because we wouldn't know how many elements go into it. A better option would be to add a PyObject * specialization for PTCharListOf and manually check that the result is a list.

template<> struct PTCharListOf<PyObject *> {
  using type = CharList<'O'>;
};

Next, we define a type to build the format:

template<typename...Ts>
struct ParseTupleBuilder { };

template<typename CL, typename T, typename...Ts>
struct ParseTupleBuilder<CL, T, Ts...> {
  using type = ParseTupleBuilder<CharListConcat_t<CL, PTCharListOf_t<T>>,
                                 Ts...>;
  constexpr static const char *fmt = type::fmt;
};

template<char...cs>
struct ParseTupleBuilder<CharList<cs...>> {
  using type = CharList<cs...>;

  static const char fmt[sizeof...(cs) + 1];
};

template<char...cs>
const char ParseTupleBuilder<CharList<cs...>>::fmt[] = { cs..., '\0' };

template<typename...Ts>
constexpr const char *ParseTupleFormat(Ts...) {
  return ParseTupleBuilder<CharList<>, std::decay_t<Ts>...>::fmt;
}

One interesting thing: When I defined fmt inside ParseTupleBuilder, I got an error from inside Python on typing "import vec" claiming that fmt's constructor had not been defined. The Python docs warn that static global variables with constructors may not be used if Python was built with a C compiler, but defining fmt outside the struct seems to fix this.

Finally, we can start defining ParseTuple(). The strategy I chose was to build an std::tuple of arguments to send to PyArg_ParseTuple() and examine each argument in a helper function. This will require two helpers, defined below, apply_tuple() and map_tuple().

template<typename F, typename T, size_t...Is>
decltype(auto) apply_tuple(F&& f, T&& t, std::index_sequence<Is...>) {
  return std::forward<F>(f)(std::get<Is>(std::forward<T>(t))...);
}

template<typename F, typename T, size_t...Is>
decltype(auto) map_tuple(F&& f, T&& t, std::index_sequence<Is...>) {
  return std::make_tuple(std::forward<F>(f)(std::get<Is>(std::forward<T>(t)))...);
}

template<typename F, typename...Ts,
         typename Is = std::make_index_sequence<sizeof...(Ts)>>
decltype(auto) map_tuple(F&& f, std::tuple<Ts...> &t) {
  return map_tuple(std::forward<F>(f), t, Is());
}

template<typename...Bound,
         typename Indicies = std::make_index_sequence<sizeof...(Bound)>>
bool ParseTuple_impl(std::tuple<Bound...> &&bound) {
  return apply_tuple(PyArg_ParseTuple, bound, Indicies());
}

template<typename...Bound, typename Arg, typename...Args>
bool ParseTuple_impl(std::tuple<Bound...> &&bound, Arg &a, Args &...as) {
  return ParseTuple_impl(std::tuple_cat(std::move(bound), std::make_tuple(&a)),
                          as...);
}

template<typename...Bound, typename...Args>
bool ParseTuple_impl(std::tuple<Bound...> &&bound, Optional, Args &...as) {
  return ParseTuple_impl(std::move(bound), as...);
}

template<typename...Bound, typename...Ts, typename...Args>
bool ParseTuple_impl(std::tuple<Bound...> &&bound, std::tuple<Ts &...> &t,
                     Args &...as) {
  auto &&mapped = map_tuple([](auto &x) { return &x; }, t);
  return ParseTuple_impl(std::tuple_cat(bound, std::move(mapped)),
                         as...);
}

template<typename...Args>
bool ParseTuple(PyObject *args, Args &&...as) {
  return ParseTuple_impl(std::make_tuple(args, ParseTupleFormat(as...)),
                          as...);
}

Before getting back to our cross product function, we will also want a BuildTuple() function. Please excuse the repetitive nature of this code.

template<typename...Bound,
         typename Indicies = std::make_index_sequence<sizeof...(Bound)>>
PyObject *BuildValue_impl(std::tuple<Bound...> &&bound) {
  return apply_tuple(Py_BuildValue, bound, Indicies());
}

template<typename...Bound, typename Arg, typename...Args>
PyObject *BuildValue_impl(std::tuple<Bound...> &&bound, Arg a, Args ...as) {
  return BuildValue_impl(std::tuple_cat(std::move(bound), std::make_tuple(a)),
                         as...);
}

template<typename...Bound, typename...Args>
PyObject *BuildValue_impl(std::tuple<Bound...> &&bound, Optional, Args &...as) {
  return BuildValue_impl(std::move(bound), as...);
}

template<typename...Bound, typename...Ts, typename...Args>
PyObject *BuildValue_impl(std::tuple<Bound...> &&bound, std::tuple<Ts...> &t,
                          Args &...as) {
  return BuildValue_impl(std::tuple_cat(bound, std::move(t)), as...);
}

template<typename...Args>
PyObject *BuildValue(Args &...as) {
  return BuildValue_impl(std::make_tuple(ParseTupleFormat(as...)),
                          as...);
}

And finally, getting back to our cross product...

PyObject *cross(PyObject *self, PyObject *args)
{
  float a, b, c;
  float x, y, z;

  if (!ParseTuple(args, std::tie(a,b,c), std::tie(x,y,z)))
    return nullptr;

  float i = b*z - c*y;
  float j = c*x - a*z;
  float k = a*y - b*x;

  return BuildValue(i, j, k);
}

That sure was a lot of work, but it created a simple interface that's hard to use improperly.

Extending Python Types

Probably the cornerstone of extending Python itself would be to define new types that interact well with the existing Python infrastructure. For efficiency's sake, the more variables we can statically type and hold in a struct, the better. The Python docs suggest extending a type this way:

typedef struct {
    PyObject_HEAD
    ...
} MyType;

The macro, PyObject_HEAD, contains fields common to any Python Object to ensure that a casting MyType pointer to a PyObject is valid. This is a common technique for representing inheritance in C, however, we can get the same affect in C++ by using inheritance.

Also, every Python type requires an accompanying PyTypeObject, which is also a PyObject. The PyTypeObject stores lots of runtime information about a type including what function to use for allocation, to convert to a string, its methods, its base class, how to deallocate it, and more.

We can use a constructor for our extension type, but it may be wisest not to. One of the fields of the type object, tp_alloc, defines how to allocate memory for this type, including setting the reference count to one, specifying the ob_type field (a member of PyObject), and a few other things. For example, it must work, even for classes that inherit from our custom type. It relates enough to Python's internals that I think it best be left alone, and can be left as NULL in the PyTypeObject without trouble.

More interesting would be tp_new, which must point to a function that calls tp_alloc and initializes the memory, and must be defined in order to create instances of our new type. We can define tp_new to use a placement new for objects in our type that require construction.

We can generalize that an extension of PyObject will look like this:

template<typename T>
struct Extention : PyObject
{
  static PyTypeObject type;

  T ext;

  T       &get()       & { return this->ext; }
  const T &get() const & { return this->ext; }

  T       *ptr()       & { return &this->ext; }
  const T *ptr() const & { return &this->ext; }
};

We can define a default tp_new and tp_dealloc and initialize type like so:

template<typename T,
         typename = std::enable_if_t<std::is_default_constructible<T>::value>>
newfunc default_new()
{
  return [](PyTypeObject *type, PyObject *args, PyObject *kwds)
  {
    using Self = Extention<T>;
    Self *self = (Self *) type->tp_alloc(type, 0);
    if (self)
      new (self->ptr()) T();
    return (PyObject *) self;
  };
}

template<typename T,
         typename = std::enable_if_t<!std::is_default_constructible<T>::value>>
auto default_new() {
  return [](PyTypeObject *type, PyObject *args, PyObject *kwds)
  {
    return type->tp_alloc(type, 0);
  };
}

template<typename T>
PyTypeObject Extention<T>::type = {
  PyObject_HEAD_INIT(NULL)
  0,                         // ob_size
  0,                         // tp_name
  sizeof(Extention<T>),      // tp_basicsize
  0,                         // tp_itemsize
  destructor([](PyObject *self) {
    ((Extention *) self)->get().T::~T();
    self->ob_type->tp_free(self);
  }),
  0,                         // tp_print
  0,                         // tp_getattr
  0,                         // tp_setattr
  0,                         // tp_compare
  0,                         // tp_repr
  0,                         // tp_as_number
  0,                         // tp_as_sequence
  0,                         // tp_as_mapping
  0,                         // tp_hash 
  0,                         // tp_call
  0,                         // tp_str
  0,                         // tp_getattro
  0,                         // tp_setattro
  0,                         // tp_as_buffer
  Py_TPFLAGS_DEFAULT,        // tp_flags
  0,                         // tp_doc 
  0,                         // tp_traverse 
  0,                         // tp_clear 
  0,                         // tp_richcompare 
  0,                         // tp_weaklistoffset 
  0,                         // tp_iter 
  0,                         // tp_iternext 
  0,                         // tp_methods 
  0,                         // tp_members 
  0,                         // tp_getset 
  0,                         // tp_base 
  0,                         // tp_dict 
  0,                         // tp_descr_get 
  0,                         // tp_descr_set 
  0,                         // tp_dictoffset 
  0,                         // tp_init 
  0,                         // tp_alloc 
  default_new<T>(),          // tp_new
};

PyTypeObject does have a few more fields, but the compiler sets them to 0 for us. We do, however, have to set tp_basicsize in order for the right amount of memory to be allocated. Since a type in C++ may not be default-constructible, default_new() may return a function that does not construct the object; this must be done in tp_init.

Now, returning to the cross product example, consider this:

struct Vec {
  float x, y, z;
};

using PyVec = Extention<Vec>;

int init_vec(PyVec *self, PyObject *args, PyObject *)
{
  Vec &v = self->get();
  if (!ParseTuple(args, v.x, v.y, v.z))
    return -1;
  return 0;
}

PyObject *vec_str(PyVec *self)
{
  return PyString_FromString(("<"  + std::to_string(self->get().x) +
                              ", " + std::to_string(self->get().y) +
                              ", " + std::to_string(self->get().z) +
                              ">").c_str());
}

PyMODINIT_FUNC initvec()
{
  PyVec::type.tp_name = "vec.Vec";
  PyVec::type.tp_init = (initproc) init_vec;
  PyVec::type.tp_repr = PyVec::type.tp_str = (reprfunc) vec_str;
  if (PyType_Ready(&PyVec::type) < 0)
    return;

  PyObject *m = Py_InitModule("vec", vecMethods);
  if (!m)
    return;

  Py_INCREF(&PyVec::type);
  PyModule_AddObject(m, "Vec", (PyObject *) &PyVec::type);
}

Note that tp_repr is used to display the result of evaluating an expression, and tp_str is used for printing. tp_init is used to construct our value and relates to Vec.__init__() in Python. PyType_Ready() finalizes the type and fills in some of the missing tp_* fields. We add the type to the module as a global object and increment its reference count so Python doesn't try to destruct it. For brevity, I decided not to include functions to check the type safety of the initproc and reprfunc casts.

Since Vec is default constructible, we only need to worry about assigning the members in the init function.

And now, cross looks like this:

PyObject *cross(PyObject *self, PyObject *args)
{
  PyObject *o1, *o2;
  if (!ParseTuple(args, o1, o2))
    return nullptr;

  // Ensure o1 and 2 are the right types.
  if (!PyType_IsSubtype(o1->ob_type, &PyVec::type) ||
      !PyType_IsSubtype(o2->ob_type, &PyVec::type))
    return nullptr;
  
  Vec &v = ((PyVec *) o1)->get(), &w = ((PyVec *) o2)->get();
  float i = v.y*w.z - v.z*w.y;
  float j = v.z*w.x - v.x*w.z;
  float k = v.x*w.y - v.y*w.x;

  PyObject *ret = PyVec::type.tp_new(&PyVec::type, nullptr, nullptr);

  PyObject *val = BuildValue(i, j, k);
  init_vec((PyVec *) ret, val, nullptr);
  Py_DECREF(val);

  return ret;
}

Conclusions

Despite this being quite a long article, it has only touched the surface of how the Python API can be extended. There are many restrictions and it certainly puts a cramp on C++'s style, but the moral of this story is that just because you need to work with a C API doesn't mean you can't use modern C++ techniques.

The python docs: https://docs.python.org/2/c-api/

The gist: https://gist.github.com/splinterofchaos/b099149a701edfa5948f

My library: https://github.com/splinterofchaos/py-cxx

Clang 3.4 and C++14

2014-02-04T16:52:00.001-08:00

With each new release, gcc and clang add on more C++11 and C++14 features. While clang has been behind on some features, though ahead on others, they now claim to have C++1y all worked out.

This article is not comprehensive.
Clang's 3.4 C++ release notes: http://llvm.org/releases/3.4/tools/clang/docs/ReleaseNotes.html#id1
libc++'s C++1y status: http://libcxx.llvm.org/cxx1y_status.html

Note: To compile these examples requires the flags, "-stdlib=libc++" and "-std=c++1y".

Variable templates.

This feature, from N3651, took me most be surprise, but it also seems quite obvious. In the simplest example, let def<T> be a variable that represents the default-constructed value of any type, T.

template<typename T>
constexpr T def = T();
 
auto x = def<int>; // x = int()
auto y = def<char>; // y = char()

The proposal uses the example of pi, where it may be more useful to store it as a float or double, or even long double. By defining it as a template, one can have precision when needed and faster, but less precise, operations otherwise.

For another example, consider storing a few prime numbers, but not specifying the type of their container.

template<template<typename...> class Seq>
Seq<int> primes = { 1, 2, 3, 5, 7, 11, 13, 17, 19 };

auto vec = primes<std::vector>;
auto list = primes<std::list>;

(gist)

Also, the standard library contains many template meta-functions, some with a static value member. Variable templates help there, too.

template<typename T, typename U>
constexpr bool is_same = std::is_same<T,U>::value;

bool t = is_same<int,int>;   // true
bool f = is_same<int,float>; // false

(std::is_same)

But since variable templates can be specialized just like template functions, it makes as much sense to define it this way:

template<typename T, typename U>
constexpr bool is_same = false;

template<typename T>
constexpr bool is_same<T,T> = true;

(gist)

Except for when one requires that is_same refers to an integral_constant.

One thing worries me about this feature: How do we tell the difference between template meta-functions, template functions, template function objects, and variable templates? What naming conventions will be invented? Consider the above definition of is_same and the fallowing:

// A template lambda that looks like a template function.
template<typename T>
auto f = [](T t){ ... };

// A template meta-function that might be better as a variable template.
template<typename T>
struct Func : { using value = ...; };

They each has subtly different syntaxes. For example, N3545 adds an operator() overload to std::integral_constant which enables syntax like this: bool b = std::is_same<T,U>(), while N3655 adds std::is_same_t<T,U> as a synonym for std::is_same<T,U>::value. (Note: libc++ is missing std::is_same_t.) Even without variable templates, we have now three ways to refer to the same thing.

Finally, one problem I did have with it is I wrote a function like so:

template<typename T>
auto area( T r ) {
    return pi<T> * r * r;
};

and found that clang thought pi<T> was undefined at link time and clang's diagnostics did little to point that out.

/tmp/main-3487e1.o: In function `auto $_1::operator()<Circle<double> >(Circle<double>) const':
main.cpp:(.text+0x5e3d): undefined reference to `_ZL2piIdE'
clang: error: linker command failed with exit code 1 (use -v to see invocation

I solved this by explicitly instantiating pi for the types I needed by adding this to main:

pi<float>;
pi<double>;

Why in main and not in global scope? When I tried it right below the definition of pi, clang thought I wanted to specialize the type. Finally, attempting template<> pi<float>; left the value uninitialized. This is a bug in clang, and has been fixed. Until the next release, variable templates work as long as only non-template functions refer to them.

Generic lambdas and generalized capture.

Hey, didn't I already do an article about this? Well, that one covers Faisal Vali's fork of clang based off of the N3418, which has many more features than this iteration based off the more conservative N3559. Unfortunately it lacks the terseness and explicit template syntax (i.e. []<class T>(T t) f(t)), but it maintains automatic types for parameters ([](auto t){return f(t);}).

Defining lambdas as variable templates helps, but variable templates lack the abilities of functions, like implicit template parameters. For the situations where that may be helpful, it's there.

template<typename T>
auto convert = [](const auto& x) {
    return T(x);
};

(gist)

Also, previously, clang couldn't capture values by move or forward into lambdas, which prohibited capturing move-only types by anything other than a reference. Transitively, that meant many perfect forwarding functions couldn't return lambdas.

Now, initialization is "general", to some degree.

std::unique_ptr<int> p = std::make_unique<int>(5);
auto add_p = [p=std::move(p)](int x){ return x + *p; };
std::cout << "5 + 5 = " << add_p(5) << std::endl;

(See also: std::make_unique)

Values can also be copied into a lambda using this syntax, but check out Scott Meyer's article for why [x] or [=] does not mean the same thing as [x=x] for mutable lambdas. (http://scottmeyers.blogspot.de/2014/02/capture-quirk-in-c14.html)

Values can also be defined and initialized in the capture expression.

std::vector<int> nums{ 5, 6, 7, 2, 9, 1 };
 
auto count = [i=0]( auto seq ) mutable {
    for( const auto& e : seq )
        i++; // Would error without "mutable".
    return i;
};

gcc has had at least partial support for this since 4.5, but should fully support it in 4.9.

Auto function return types.

This is also a feature gcc has had since 4.8 (and I wrote about, as well), but that was based off of N3386, whereas gcc 4.9 and clang 3.4 base off of N3638. I will not say much here because this is not an entirely new feature, not much has changed, and it's easy to groc.

Most notably, the syntax, decltype(auto), has been added to overcome some of the shortcomings of auto. For example, if we try to write a function that returns a reference with auto, a value is returned. But if we write it...

decltype(auto) ref(int& x) {
    return x;
}

decltype(auto) copy(int x) {
    return x;
}

(gist)

Then a reference is returned when a is given, and a copy when a value is given. (Alternatively, the return type of ref could be auto&.)

More generalized constexprs.

The requirement that constexprs be single return statements worked well enough, but simple functions that required more than one line could not be constexpr. It sometimes forced inefficient implementations in order to have at least some of its results generated at compile-time, but not always all. The factorial function serves as a good example.

constexpr unsigned long long fact( unsigned long long x ) {
    return x <= 1 ? 1ull : x * fact(x-1);
}

but now we can write...

constexpr auto fact2( unsigned long long x ) {
    auto product = x;
    while( --x ) // Branching.
        product *= x; // Variable mutation.
    return product;
}

(gist)

This version may be more efficient, both at compile time and run time.

The accompanying release of libc++ now labels many standard functions as constexpr thanks to N3469 (chrono), 3470 (containers), 3471 (utility), 3302 (std::complex), and 3789 (functional).

Note: gcc 4.9 does not yet implement branching and mutation in constexprs, but does include some of the library enhancements.

std::integer_sequence for working with tuples.

Although this library addition may not be of use to everyone, anyone who has attempted to unpack a tuple into a function (like this guy or that guy or this one or ...) will appreciate N3658 for "compile-time integer sequences". Thus far, no standard solution has existed. N3658 adds one template class, std::integer_sequence<T,t0,t1,...,tn>, and std::index_sequence<t0,...,tn>, which is an integer_sequence with T=size_t. This lets us write an apply_tuple function like so:

template<typename F, typename Tup, size_t ...I>
auto apply_tuple( F&& f, Tup&& t, std::index_sequence<I...> ) {
    return std::forward<F>(f) (
         std::get<I>( std::forward<Tup>(t) )... 
    );
}

(See also: std::get)

For those who have not seen a function like this, the point of this function is just to capture the indexes from the index_sequence and call std::get variadically. It requires another function to create the index_sequence.

N3658 also supplies std::make_integer_sequence<T,N>, which expands to std::integer_sequence<T,0,1,...,N-1>, std::make_index_sequence<N>, and std::index_sequence_for<T...>, which expands to std::make_index_sequence<sizeof...(T)>.

// The auto return type especially helps here.
template<typename F, typename Tup >
auto apply_tuple( F&& f, Tup&& t ) {
    using T = std::decay_t<Tup>; // Thanks, N3655, for decay_t.

    constexpr auto size = std::tuple_size<T>(); // N3545 for the use of operator().
    using indicies = std::make_index_sequence<size>; 

    return apply_tuple( std::forward<F>(f), std::forward<Tup>(t), indicies() ); 
}

(See also: std::decay, std::tuple_size, gist)

Unfortunately, even though the proposal uses a similar function as an example, there still exists no standard apply_tuple function, nor a standard way to extract an index_sequence from a tuple. Still, there may exist several conventions for applying tuples. For example, the function may be the first element or an outside component. The tuple may have an incomplete argument set and require additional arguments for apply_tuple to work.

Update: Two library proposals in the works address this issue: N3802 (apply), and N3416 (language extension: parameter packs).

experimental::optional.

While not accepted into C++14, libc++ has an implementation of N3672's optional hidden away in the experimental folder. Boost fans may think of it as the standard's answer to boost::optional as functional programers may think of it as like Haskell's Maybe.

Basically, some operations may not have a value to return. For example, a square root cannot be taken from a negative number, so one might want to write a "safe" square root function that returned a value only when x>0.

#include <experimental/optional>

template<typename T>
using optional = std::experimental::optional<T>;

optional<float> square_root( float x ) {
    return x > 0 ? std::sqrt(x) : optional<float>();
}

(gist)

Using an optional is simple because they implicitly convert to bools and act like a pointer, but with value semantics (which is incidentally how libc++ implements it). Without optional, one might use a unique_ptr, but value semantics on initialization and assignment make optional more convenient.

auto qroot( float a, float b, float c ) 
    -> optional< std::tuple<float,float> >
{
    // Optionals implicitly convert to bools.
    if( auto root = square_root(b*b - 4*a*c) ) {
        float x1 = -b + *root / (2*a);
        float x2 = -b - *root / (2*a);
        return {{ x1, x2 }}; // Like optional{tuple{}}.
    }
    return {}; // An empty optional.
}

(gist)

Misc. improvements.

This version of libc++ allows one to retrieve a tuple's elements by type using std::get<T>.

std::tuple<int,char> t1{1,'a'};
std::tuple<int,int>  t2{1,2}; 
int x = std::get<int>(t1); // Fine.
int y = std::get<int>(t2); // Error, t2 contains two ints.

Clang now allows the use of single-quotes (') to separate digits. 1'000'000 becomes 1000000, and 1'0'0 becomes 100. (example) (It doesn't require that the separations make sense, but one cannot write 1''0 or '10.)

libc++ implements N3655, which adds several template aliases in the form of std::*_t = std::*::type to <type_traits>, such as std::result_of_t, std::is_integral_t, and many more. Unfortunately, while N3655 also adds std::is_same_t (see the top of the 7th page), libc++ does not define it. I do not know, but I believe this may be an oversight that will be fixed soon as it requires only one line.

N3421 adds specializations to the members of <functional>. If one wanted to send an addition function into another functions, one might write f(std::plus<int>(),args...), but we new longer need to specify the type and can instead write std::plus<>(). This instantiates a function object that can accept two values of any type to add them. Similarly, std::greater<>, std::less<>, std::equal_to<>, etc...

Conclusions.

This may not be the most ground-breaking release, but C++14 expands on the concepts from C++11, improves the library, and adds a few missing features, and I find it impressive that the clang team has achieved this so preemptively. I selected to talk about the features I thought were most interesting, but I did not talk about, for example, sized deallocation, std::dynarray (<experimental/dynarry>), some additional overloads in <algorithm>, or Null Forward Iterators, to name a few. See the bottom for links to the full lists.

The GNU team still needs to do more work to catch up to clang. If one wanted to write code for both gcc 4.9 and clang 3.4, they could use generic lambdas, auto for return types, but not variable templates or generalized constexprs. For the library, gcc 4.9 includes std::make_unique (as did 4.8), the N3412 specializations in <functional>, integer sequences, constexpr library improvements, even experimental::optional (though I'm not sure where), and much more. It may be worth noting it does not seem to include the <type_traits> template aliases, like result_of_t.

See clang's full release notes related to C++14 here: http://llvm.org/releases/3.4/tools/clang/docs/ReleaseNotes.html#id1
For libc++'s improvements, see: http://libcxx.llvm.org/cxx1y_status.html
gcc 4.9's C++14 features: http://gcc.gnu.org/projects/cxx1y.html
And gcc's libstdc++ improvements: http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2014

The code I wrote to test these features: https://gist.github.com/splinterofchaos/8810949

Clang and Generic (Polymorphic) Lambdas.

2012-12-28T11:11:00.000-08:00

Recently a Faisal Vali put forth an implementation of n3418, which he co-authored with Herb Stutter and Dave Abraham, allowing generic lambdas using a fork of clang. It also includes auto type deduction, which I wrote about being implemented in gcc 4.8. There are a few caveats before continuing: This has not been merged into the mainline. It has a few bugs, but Vali is quick to fix them if you point one out. The implementation itself is a proof of concept (similar to automatic type deduction) and so it isn't unreasonable to assume some things might change. Section 2.4 of the proposal (named lambdas) has not yet been implemented. And while this doesn't allow us to do many things that were previously impossible, the possible used to be so verbose that no one would want to do it!

Generic lambdas are profound and may have a great impact on the style of code written in C++. Consider this a light (and lacking) overview of what is possible.

Before continuing, I want to note that I found evidence that some GCC developers had begun working on generic lambdas (from the mailing list: Latest experimental polymorphic lambda patches), however, I cannot find anything more recent than 2009 discussing this, and code using auto and lambdas does not compile.

Being terse.

This patch allows the writing of incredibly terse polymorphic functions, such as these:

auto add = [](auto x, auto y) x + y;
auto times = [](auto x, auto y) x * y; 
auto print_int = [](int x) void(std::cout << x);

No {}, no return, auto-deduced types, and void can even be used to throw away the value of state-full operations. x and y can be anything and the return type is entirely dependent on them. Why is this interesting? Say you want to find the product of a vector.

auto prod = std::accumulate ( 
    v.begin(), v.end(), 1,
    []( int x, int y ){ return x * y; }
);

Bit of a mouthful, right? v might store ints today, but tomorrow, maybe it will store long long ints to avoid overflow or just unsigned ints to avoid negatives. When the vector's declaration changes, the lambda's argument types need to change, which is a maintenance problem. I currently solve this by writing polymorphic function objects.

constexpr struct {
    template< class X, class Y >
    auto operator () ( X x, Y y )
        -> decltype(x*y)
    {
        return x * y;
    }
} timesObj{};

But the above and times are mostly equivalent. (A lambda can be thought of as an automatic function object. It even has a unique type. (see below: Overloading))

auto prod = std::accumulate ( 
    v.begin(), v.end(), 1,
    times
);

This never needs to change unless the underlying operation (multiplication) changes.

Sometimes, an underlying operation is consistent across types. Using zip_tuple as an example from my article "Zipping and Mapping Tuples", one could write:

std::tuple<int,std::string> a{1,"hello "}, 
                            b{2,"world!"};

// r = {3,"hello world"}
auto r = zipTuple( ([](auto x, auto y) x + y), a, b );

Because of the comma operator, we must put the lambda in parentheses to make clear where it ends.

Up until now, things like composition could not be lambdas.

template< class F, class G > struct Composition {
    F f;
    G g;

    template< class ...X >
    auto operator () ( X&& ...x ) 
        -> decltype( f(g(std::forward<X>(x)...)) )
    {
        return f( g( std::forward<X>(x)... ) );
    }
};

template< class F, class G, class C = Composition<F,G> >
C compose( const F& f, const G& g ) {
    return C{f,g};
}

int main () {
    auto add1 = []( auto x ) x + 1;
    auto Char = []( auto c ) char(c);
    // Prints "a + 1 = b"
    std::cout << "a + 1 = " << compose(Char,add1)('a') << std::endl;
}

compose is generic and it returns a generic function object. Generic lambdas make the same possible by returning another generic lambda that captures f and g.

auto compose = [](auto f, auto g) 
    [=](auto x) f(g(x));

However, this version of compose only allows one argument. Luckily, generic lambdas can be fully templated, variadic, and perfect forwarding.

auto compose = 
    []( auto f, auto g )
        [=]<class ...X>( X&& ...x ) 
            f( g(std::forward<X>(x)...) );

However, the syntax for these lambdas is so convenient, one might as well drop the functional programming theory and write

auto f = [](char c) char(c+1);

For an example of the power of nested lambdas, consider currying:

auto curry3 = 
    []( auto f )
        [=](auto x) [=](auto y) [=](auto z) 
            f(x,y,z);

auto sum3 = [](auto x, auto y, auto z) x + y + z;
auto ten = curry3(sum3)(2)(3)(5)

Nested lambdas especially aid in the use of monads, as I have written about previously ("Monads in C++").

auto justThree = Just(1) >>= [](auto x)
                 Just(2) >>= [](auto y)
                 mreturn<std::unique_ptr>( x + y ); // Or Just(x+y).

This also takes care of the return mreturn problem I discussed in that article.

Overloading

Overloading functions is obviously useful, but impossible with lambdas alone. To fully take advantage of their brevity, we must have a way to overload them. In the proposal, Mathius Gaunard is attributed with the following:

template<class F1, class F2> struct overloaded : F1, F2
{
    overloaded(F1 x1, F2 x2) : F1(x1), F2(x2) {}
    using F1::operator();
    using F2::operator();
};

template<class F1, class F2>
overloaded<F1, F2> overload(F1 f1, F2 f2)
{ return overloaded<F1, F2>(f1, f2); }

(See also: "Unifying Generic Functions and Function Objects")

This works because lambdas are function objects with a unique type, and can therefore act as the base class for overloaded. This is an unlikely solution because this fact is so rarely taken advantage of, however there is much advantage to take!

Unfortunately, one cannot inherit from function pointers, so, in order to overload lambdas and regular functions together requires a little more work. First, we must define a base type that can handle both function pointers and function objects. It's job is to just forward the arguments to its function.

template< class F > struct Forwarder : F {
    constexpr Forwarder( const F& f ) : F(f) { }
};

template< class R, class ...X > struct Forwarder<R(*)(X...)> {
    using type = R(*)(X...);
    type f;

    constexpr Forwarder( type f ) : f(f) { }

    constexpr R operator () ( X... x ) {
        return f(x...);
    }
};

template< class R, class ...X > 
struct Forwarder<R(X...)> : Forwarder<R(*)(X...)>
{
    using type = R(*)(X...);
    constexpr Forwarder( type f ) 
        : Forwarder<R(*)(X...)>(f)
    {
    }
};

Function pointers get two specializations because decltype(f)=R(X) and decltype(&f)=R(*)(X). It makes the most sense to specialize for pointers, but only doing so would require we take the address of our functions when we call overload.

Next, Overloaded inherits from two Forwarders.

template< class F, class G >
struct Overloaded : Forwarder<F>, Forwarder<G> {
    constexpr Overloaded( const F& f, const G& g )
        : Forwarder<F>(f), Forwarder<G>(g)
    {
    }
};

template< class F > F overload( F&& f ) {
    return std::forward<F>(f);
}

template< class F, class G, class ...H,
          class O1 = Overloaded<F,G> > 
auto overload( const F& f, const G& g, const H& ...h ) 
    -> decltype( overload(O1(f,g),h...) )
{
    return overload( O1(f,g), h... );
}

Overloads can be of arity and domain (or argument type). The simplest example, for demonstration purposes, is a printing function.

auto prnt = overload (
    // Anything cout is already defined for.
    []( const auto& x ) 
        -> decltype( void(std::cout << x) )
    { std::cout << x; },

    // Any STL sequence.
    []<class Sequence>( const Sequence& s )
        -> decltype( void(std::begin(s)) )
    {
        std::cout << "[ ";
        for( const auto& x : s ) 
            std::cout << x << ' ';
        std::cout << ']';
    },

    // These are both sequences for which cout is defined.
    // Specializing disambiguates this.
    []( const char* const s ) { std::cout << s; },
    []( const std::string& s ) { std::cout << s; }
);

// ...
prnt("abc"); // Prints abc.
prnt(std::vector<int>{1,2,3}); // Prints [ 1 2 3 ].

Although defining all overloads in a single statement is an annoyance, they are grouped together, they don't require a template<...> line, and the visual clutter is overall less than if prnt were defined as the equivalent (explicit) function object.

Perhaps a function must be overloaded, but decltype or std::enable_if is too accepting and specializing for each type is redundant. For example, one might be annoyed by the last two string specializations of prnt. One solution is to define yet another overload type.

template< class X, class F >
struct UnaryOverload {
    F f;
    UnaryOverload( const F& f ) : f(f) { }

    using result_type = typename std::result_of< F(X) >::type;

    result_type operator () ( X x ) const {
        return f(x);
    }
};

template< class ...X, class F >
auto unary_overload_set( const F& f ) 
    -> decltype( overload(UnaryOverload<X,F>(f)...) )
{
    return overload( UnaryOverload<X,F>(f)... );
}

auto prnt = overload (
    // ...

    unary_overload_set<const char* const,
                       const std::string&>(
        []( auto&& s ) { std::cout << s; }
    )
);

One might write an overloading class to specialize for specific types, or a category of types, or more generally, a class might be written to do type conversion before calling the inner function, to prepare the output, or whatever one's needs may be. An overloading type might even select one of two functions based on an enable_if.

// Where pred is a templated type defining pred<X>::value.
auto h = enable_when<pred>( f, g );

The downsides of overloading function objects include that each overload must be defined all at once and none can be added. That isn't too bad since the point of lambdas is to be brief, but one should be mindful of extensibility when writing generic functions. (In other words, if an overload must be added, is it OK to modify the function object, or must the user be able to add overloads later?)

Recursion.

Without generic lambdas, recursion is possible.

std::function<int(int)> fact = [&]( int x ) x * fact(x-1);

Or, with function pointers, which a lambda may implicitly convert to.

// In global scope:
using IntToInt = int(*)(int);
IntToInt fact = []( auto x ) not x ? 1 : x * fact(x-1);

With generic lambdas, we could write it like this:

auto fact1 = []( auto f, int n ) -> int 
    not n ? 1 : f(f,n-1) * n;
auto fact = []( int x ) -> int 
    fact1( fact1, x );

One might notice that the Fibonacci sequence could be implemented in a similar fashion. In researching recursive lambdas, I came across the fixed-point combinator. Haskell has fix, which can be implemented like this:

auto fix = []( auto f )
    [=]( auto x ) -> decltype(x) f( f, x );

auto fact = fix (
    []( auto f, int n ) -> int
    not n ? 1 : f(f,n-1) * n
);

auto fib = fix (
    []( int f, int n ) -> int
    n == 0 ? 0 :
    n == 1 ? 1 :
    f(f,n-1) + f(f,n-2)
);

fix is a generalization of a certain type of recursion. For an idea of how one would implement fix without lambdas, see this Stack Overflow post.

Making prnt above variadic requires a different kind of recursion.

// Variadic void unary.
auto vvu_impl = overload (
    [] (auto,auto) {},
    []<class X, class ...Y>( const auto& self, const auto& u, 
                             X&& x, Y&& ...y ) 
    {
        u( std::forward<X>(x) );
        self( self, u, std::forward<Y>(y)... );
    }
);

auto vvu = []( const auto& u )
    [&]<class ...x>( const x& ...x )
        vvu_impl( vvu_impl, u, x... );

// Variadic print.
// vprint(x,y...) = prnt(x); prnt(y)...
auto vprint = vvu( prnt );

auto print_line = []<class ...X>( const X& ...x ) 
    vprint( x..., '\n' );
 
print_line( "abc ", 123, std::vector<int>{1} ); // prints "abc 123 [1]\n"

We can generalize left-associativity as well.

auto chainl_impl = overload (
    []( auto self, auto b, auto r ) { return r; },
    []<class ...Z>( auto self, auto b, auto x, auto y, Z ...z ) 
        self( self, b, b(x,y), z... )
);

auto chainl = []( auto b )
    [=]<class ...X>( const X& ...x )
        chainl_impl( chainl_impl, b, x... );

auto sum = chainl(add);
auto three = sum(1,1,1);

// Variadic compose.
auto vcompose = chainl( compose );

auto inc = []( auto x ) ++x;
auto addThree = vcompose( inc, inc, inc );

A good exercise might be to (a) write a variadic version of fix and (b) use that version to reimplement chainl and vprint.

There are, of course, many types of recursion. Implementing recursive lambdas is more complicated than for regular functions, not by too much.

Conclusions.

Polymorphic (generic) lambdas are very powerful indeed, but it may take a while before GCC, MSVC, and others catch up, much yet before Faisal Vali's branch is merged back into Clang. Still they may have a strong impact on code written in C++ in the future. Some thought that templates relieved a sort of functional language in C++, and others thought the same of constexpr. Generic lambdas reveal another, but more flexible one.

Lambdas cannot be marked constexpr. In terms of efficiency, I do not think this matters. They are implicitly inlined, so the compiler may still take advantage of any compile-time information it can gather. However, the result of a lambda expression could never be used in a template parameter, for example, which means they don't replace generic constexpr function objects.

Also, explicitly specifying a type is more verbose because the rules are the same as for template member functions, so lambdas can't replace template functions that require explicit parameters.

auto f = []<class X>() { return x(); };
f.operator()<int>(); // bad

The proposal to add polymorphic lambdas to C++ is not finalized and a few things are up in the air. For example, can we elide auto and just write [](x) f(x)? Should we be allowed to elid the enclosing braces and return? Are the implemented parts of this proposal useful? Remember that the standardization process is open to the public and that we can make our voices heard about the features that impact us.

Personally, I like the proposal as implemented currently. (Requiring auto, but eliding { return ... }.) I would go a step further and say that auto should be extended to allow variadic parameters. (i.e. [](auto ...x) f(x...)) And named lambdas (section 2.4) will be a very nice addition.

What are your thoughts?

Source for this article: https://gist.github.com/4347130 and https://gist.github.com/4381376
Another (google translated) article on generic lambdas: http://translate.google.com/translate?hl=en&sl=ja&u=http://d.hatena.ne.jp/osyo-manga/20121225/1356368196&prev=/search%3Fq%3Dgeneric%2Blambda%2Bc%252B%252B%2Bclang%26start%3D10%26hl%3Den%26safe%3Doff%26client%3Dubuntu%26sa%3DN%26tbo%3Dd%26channel%3Dfs%26biw%3D1535%26bih%3D870&sa=X&ei=v-LbUOG5BOmQ2gXw7ICABg&ved=0CFsQ7gEwBjgK
A long and thoughou article on the fixed-point combinator: http://mvanier.livejournal.com/2897.html

Quick and Easy -- Manipulating C++ Containers Functionally.

2012-12-14T17:12:00.000-08:00

Update: Added examples for dup and foldMap.

Probably the most useful parts of the standard C++ library would be container and algorithms support. Who has worked in C++ for any non-trivial amount of time without using std::vector, list, map, or any of the others? <algorithm>, on the other hand, is more something everyone should know. It solves many of the problems that C++ developers encounter on a daily basis.

 "How do I test if there exists an element, x, where p(x) is true?" : std::any_of
 "How do I copy each element, x, where p(x)?" : std::copy_if
"How do I removed each element, x, where p(x)?" : std::remove_if
"How do I move elements from one container to another?" : std::move, <algorithm> version.
"How do I find a subsequence?" : std::search
"How do I sort an array?" std::sort
"How do I find the sum of an array?" : std::accumulate

Any programmer worth half their salt could write any of these functions in their sleep--they're basic--and the thing is that these algorithms do get written, over and over and over again. Either because one does not realize a specific <algorithm> function exists, or because one is thinking on a low level and unable to see the higher level abstractions.

What I like most about the STL is that the only requirements for adapting any data type to a sequence are (1) define an iterator, and (2) define begin() and end(). After that, all (if not most) of <algorithm> becomes instantly usable with that type. (As well as the range-based for loop.) This makes it incredibly generic and useful.

What I dislike is its verbosity. For example:

 std::transform( xs.begin(), xs.end(), xs.begin(), f );

Wouldn't this be more clear if written...

 xs = std::transform(xs,f);

And this allows us to compose functions.

 std::transform( xs.begin(), xs.end(), xs.begin(), f );
 std::transform( xs.begin(), xs.end(), xs.begin(), g );

// vs
xs = std::transform( std::transform(xs,f), g );

// Or, using actual composition:
xs = std::transform( xs, compose(g,f) );

That's what this article will be about. An abstraction over the STL that lends itself to writing more terse, concise code without losing any clarity. This abstraction is less general, by design, because it works on entire containers, not iterators. I am not writing about a replacement for any <algorithm> functions, but an alternative inspired by functional programming.

However, I do go over many <algorithm> functions, so this can also be thought of as a review.

Filtering, Taking, and Dropping: Collecting data.

I've always found the erase-remove idiom an unintuitive solution to such a common problem. I certainly would not have figured it out on my own without the help of the C++ community to point it out. Requiring containers to define a predicated erase wouldn't be generic, and <algorithm> knows only of iterators, not containers, so the standard library can't offer anything simpler. filter fills this gap by combining its knowledge of containers and iterators.

template< class P, class S >
S filter( const P& p, S s ) {
    using F = std::function< bool(typename S::value_type) >;

    s.erase (
        std::remove_if( std::begin(s), std::end(s), std::not1(F(p)) ),
        std::end(s)
    );

    return s;
}

// ...

std::vector<int> naturals = {1,2,3,4,5,6,7,8,9/*,...*/};
auto evens = filter( [](int x){ return x%2==0; }, naturals );

See also: std::not1.

It does two things: First, it inverses the predicate meaning we can use positive logic (defining what we want to keep, rather than throw away) and second, it abstracts the erase-remove idiom.

Using filter, we can write a rather quick-and-dirty qsort.

// For each x in s, returns pair( p(x), not p(x) ).
template< class P, class S >
std::pair<S,S> partition( const P& p, S s ) {
    using F = std::function< bool(typename S::value_type) >;

    // There does exist std::partition_copy, 
    // however this function demonstrates a use of filter.
    return std::make_pair ( 
        filter( p,    s ),
        filter( std::not1(F(p)), s )
    );
}

// Fake Quick-Sort: A non-in-place, qsort-inspired function.
template< class S >
S fake_qsort( S s ) {
    using X = typename S::value_type;

    if( s.size() < 2 )
        return s;

    X pivot = s.back();
    s.pop_back();

    S low, high; 
    std::tie(low,high) = partition (
        [&]( const X& x ) { return x <= pivot; },
        std::move(s)
    );

    low = fake_qsort( std::move(low) );
    low.push_back( pivot );
    
    // Append the sorted high to the sorted low.
    high = fake_qsort( std::move(high) );
    std::move( std::begin(high), std::end(high), 
               std::back_inserter(low) );

    return low;
}

See also: std::partition, std::partition_copy, and std::sort.

take is a function that may seem entirely trivial, at least at first.

template< class S, class _X = decltype( *std::begin(std::declval<S>()) ),
          class X = typename std::decay<_X>::type >
std::vector<X> take( size_t n, const S& s ) {
    std::vector<X> r;
    std::copy_n( std::begin(s), n, std::back_inserter(r) );
    return r;
}

template< class P, class S, 
          class _X = decltype( *std::begin(std::declval<S>()) ), 
          class X  = typename std::decay<_X>::type >
std::vector<X> takeWhile( const P& p, const S& s ) {
    std::vector<X> r;
    std::copy( std::begin(s), 
               std::find_if( std::begin(s), std::end(s), p ),
               std::back_inserter(r) );
    return r;
}

It also breaks the convention of returning s's type. There's a reason for that. Infinite lists. Consider this Haskell code:

take 10 [1..] == [1,2,3,4,5,6,7,8,9,10]

[1...] is an infinite list, starting at one. Obviously, it doesn't actually exist in memory. take returns a finite list that does.

The concept of iterators that represent infinite ranges in C++ isn't new, but neither is it common. std::insert_iterator could insert a theoretically infinite number of elements into a container. std::istream_ and ostream_iterator may read from or write to a file infinitely.

We can create pseudo-containers to represent infinite ranges and plug them into take.

template< class X > struct Reader {
    using iterator = std::istream_iterator<X>;

    iterator b;
    iterator e;

    Reader( iterator b = iterator( std::cin ), 
            iterator e = iterator() )
        : b(b), e(e)
    {
    }

    iterator begin() const { return b; }
    iterator end()   const { return e; }
};

// Probably ill-advised, 
// but this is /one/ way of doing IO before main().
std::vector<int> three = take( 3, Reader<int>() );

Sometimes we want to take the contents of an entire container, so dup may be helpful.

std::vector<X> dup( const S& s ) {
    std::vector<X> r;
    std::copy( std::begin(s), 
               std::end(s),
               std::back_inserter(r) );
    return r;
}

std::ifstream in( "in.txt" );
// Reader's constructor implicitly converts in to an iterator.
// Some may consider this bad style and require the constructor be "explicit".
std::vector<int> contents = dup( Reader<int>(in) );

The counterpart to take is drop, but it does not have take's quirks.

template< class S >
S drop( size_t n, const S& s ) {
    return S (
        std::next( std::begin(s), n ),
        std::end(s)
    );
}

// Predicate version
template< class P, class S >
S dropWhile( const P& p, const S& s ) {
    return S (
        std::find_if_not( std::begin(s), std::end(s), p ),
        std::end(s)
    );
}

Reader<int> r = drop( 2, Reader<int>() );

drop makes no promises about infinite lists, but unlike most container- or range-based algorithms, it can work on them. In the above example, two integers are read from std::cin, and their values lost.

For another example of the use of pseudo-containers, consider this solution the the first Euler Project problem using boost::irange.

#include <boost/range/irange.hpp>
void euler1() {
    // multiples of...
    auto three = boost::irange( 3, 1001, 3 );
    auto five  = boost::irange( 5, 1001, 5 );

    // Ensure that the final sum has no duplicates.
    std::vector<int> all;
    std::set_union( std::begin(three), std::end(three),
                    std::begin(five),  std::end(five),
                    std::back_inserter(all) );

    std::cout << "The sum of every multiple of 3 or 5 bellow 1000 :" 
        << std::accumulate( std::begin(all), std::end(all), 0 ) 
        << std::endl;
}

Folding: Reducing a list from many to one. (std::accumulate)

Accumulating is the "imperative" description of folding. Historically, you'd call the variable you update with the results of each calculation the accumulator. To accumulate, then, is to iterate through a sequence, updating the accumulator with each iteration.

Folding is another way to think of it. A fold is a transformation from a list of values to just one value. Haskell defines foldl and foldr, meaning "fold left" and "right".

template< class F, class X, class S >
constexpr X foldl( F&& f, X x, const S& s ) {
    return std::accumulate (
        std::begin(s), std::end(s),
        std::move(x), std::forward<F>(f) 
    );
}

int main() {
    std::vector<int> v = { 5, 3, 2 };
    std::cout << "((10 - 5) - 3) - 2) = " << foldl( std::minus<int>(), 10, v ) << std::endl;
}

foldl is really just another name for accumulate. The accumulation function (here, std::minus) expects the accumulator as the left argument and value to accumulate as its right. foldr is reversed: Not only does it iterate in reverse, but expects the accumulator in the right-hand argument.

// A function wrapper that flips the argument order.
template< class F > struct Flip {
    F f = F();

    constexpr Flip( F f ) : f(std::move(f)) { }

    template< class X, class Y >
    constexpr auto operator () ( X&& x, Y&& y )
        -> typename std::result_of< F(Y,X) >::type
    {
        return f( std::forward<Y>(y), std::forward<X>(x) );
    }
};

template< class F, class X, class S >
constexpr X foldr( F&& f, X x, const S& s ) {
    using It = decltype(std::begin(s));
    using RIt = std::reverse_iterator<It>;
    return std::accumulate (
        // Just foldl in reverse.
        RIt(std::end(s)), RIt(std::begin(s)),
        std::move(x), 
        Flip<F>(std::forward<F>(f))
    );
}

int main() {
    std::vector<int> v = { 5, 3, 2 };
    std::cout << "(2 - (3 - (5-10))) = " << foldr( std::minus<int>(), 10, v ) << std::endl;
}

Folding is great for monoids; types that have a binary operation with, often, the the signature "X(const X&, const X&)".

std::vector<std::string> strs = { "st", "ri", "ng" };
// std::string associated with (+) is a monoid.
std::cout << "'st' + 'ri' + 'ng' = " << 
    foldl( std::plus<std::string>(), std::string(), strs ) << std::endl;

using Func = std::function< int(int) >;

auto comp = []( Func f, Func g ) {
    return [f,g]( int x ){ return f(g(x)); };
};

auto inc = []( int x ) { return x+1; };
auto id  = []( int x ) { return x;   };

std::vector<Func> incs = { inc, inc, inc };
// Functions can be monoids under composition.
std::cout << "(inc . inc . inc)(1) = " << 
    foldl( comp, Func(id), incs )(1) << std::endl;

Functional programmers also like to build lists using fold. They build lists starting at the tail, so they typically prefer foldr to foldl. std::forward_list works like [] in Haskell and linked lists in other functional languages. This snippet simply copies the values from the std::vector, v.

using List = std::forward_list<int>;
auto cons = []( List l, int x ) {
    l.push_front( x );
    return std::move(l);
};

List l = foldr( cons, List(), v );

Note: This one is not an example of a monoid.

Zip and Map: many to many. (std::transform)

To zip two sequences together by some function is the same as calling std::transform. Transform implies modifying each member by some function. Zip implies the same, but with the visual metaphor of combining two lists into one, starting at one end and working up.

template< class F, template<class...>class S, class X, class Y,
          class Res = typename std::result_of< F(X,Y) >::type >
S<Res> zip( F&& f, const S<X>& v, const S<Y>& w ) {
    S<Res> r;
    std::transform( std::begin(v), std::end(v),
                    std::begin(w), 
                    std::back_inserter(r),
                    std::forward<F>(f) );
    return r;
}

int main() {
    std::vector<int> v = { 5, 3, 2 };
    auto doubleV = zip( std::plus<int>(), v, v );
}

Note: The only way I have discovered to write zip variadically is with tuples. Since this article is not on tuples, refer to the definition of transform in "Zipping and Mapping Tuples".

Note2: An in-place version of this function is possible, but showing both general and optimized versions of each function would be redundant, and the topic of optimization is worth discussing on its own.

Mapping is similar to zipping--in fact the two-argument forms of zip(f,xs) and map(f,xs) should be equivalent. The three argument form, like map(f,xs,ys), applies f to every combination of x and y.

map(f,{x,y},{a,b}) == { f(x,a), f(x,b), f(y,a), f(y,b) }

If xs is size N and ys is of size M, then map(f,xs,ys) returns a sequence of size N x M.

template< class F, template<class...>class S, class X,
          class Res = typename std::result_of< F(X) >::type >
S<Res> map( const F& f, const S<X>& s ) {
    S<Res> r;
    std::transform( std::begin(s), std::end(s),
                    std::back_inserter(r),
                    std::forward<F>(f) );
    return r;
}

template< class F, template<class...>class S, class X, class Y,
          class Res = typename std::result_of< F(X,Y) >::type >
S<Res> map( const F& f, const S<X>& xs, const S<Y>& ys ) {
    S<Res> r;

    for( const X& x : xs ) 
        for( const Y& y : ys )
            r.emplace_back( f(x,y) );

    return r;
}

int main() {
    std::vector<int> v = { 5, 3, 2 };
    std::vector<int> w = { 9, 8, 7 };
    auto sums = map( std::plus<int>(), v, w );
}

map is a bread and butter function in functional programming.

 // Convert a sequence from one type to another:
auto ints = map( toInt, floats );

 // In a game loop:
actors = map( update, actors );

// A deck of cards (four suites with twelve values).
auto deck = map( make_card, suites, value );

 // Making variants of the same thing from simpler data.
 auto inits = { 1, 2, 3, 4 };
 auto cs = map (
 []( int i ) { return std::complex<float>(i,0.1); },
 inits
 );

 // Checking for collisions:
ColisionObject collisions = map( make_collision, actors, actors );

// AI:
states = map( successor, actions, states );

One downfall of map is that it may create redundancies, which makes filter useful in conjunction.

 states = filter (
state_is_valid,
map( successor, actions, states )
);

While this may turn an algorithm from one-pass (update and add if valid) to two-pass (update all states, then filter), it also makes simpler algorithms that can be optimized more easily by the compiler at times. For example,

 for( auto x : xs ) {
 for( auto y : ys ) {
 z = x * y;
if( pred(z) ) r.push_back(z);
}
}

// or:
auto r = filter( pred, map(std::multiplies<int>(),xs,ys) );

While only profiling can tell in any given instance, the second example may be faster under some circumstances. The compiler may be able to vectorize the call to map, but have difficulties applying the same optimization to the first because it cannot evaluate both the multiplication and predicate in one vectorized step.

Sometimes, the goal is to calculate something given the data, rather than map it. Naively, one might write something like

auto r = fold( f, map(g,xs) );

But isn't creating the new container inefficient? What if an in-place version of map were implemented, wouldn't transforming xs before folding still be inefficient? Thus, foldMap is useful.

template< class Fold, class Map, class X, class S >
X foldMap( const Fold& f, const Map& m, X x, const S& s ) {
    for( auto& y : s )
        x = f( std::move(x), m(y) );
    return x;
}

#include <cctype>
int main() {
    const char* names[] = { "jonh", "mary", "cary" };
    auto appendNames = []( std::string x, std::string y ) {
        return x + " " + y; 
    };
    auto capitolizeName = []( std::string name ) {
        name[0] = std::toupper( name[0] );
        return name;
    };
    std::cout << "Names : " 
        << foldMap (
            appendNames,
            capitolizeName,
            std::string(),
            names
        ) << std::endl;
}

Conclusions.

Haskell's Data.List is actually a lot like <algorithm>, though on a higher level of abstraction. There are some things that can only be done with iterators, but many that can also only be done with whole containers. Data.List gives some good inspiration for helpful algorithms, even in C++.

But unlike in C++, Haskell uses simple linked lists by default and all of Data.List's function work only on linked lists. This gives both Haskell and functional programming a bad name when people compare Haskell code using linked lists to C++ code using std::vector. (See "C++ Benchmark -- std::vector vs. std::list vs. std::deque") When libraries are written to optimize inefficiencies in the linked list, like Data.Text, they re-implement Data.List's interface and often achieve equivalent efficiency to well-optimized C, but not without plenty of redundancy.

In C++, we can write one static interface that is both generic and efficient. Writing functional code does not mean writing slow code. The mathematical nature of these operations can even help the compiler optimize. The high-level interface of Data.List fits snugly atop of the low-level interface of iterators.

Source for this article: https://gist.github.com/4290166

Zipping and Mapping tuples.

2012-12-12T08:00:00.000-08:00

Previously, I discussed some basic things that can be done with tuples. I showed how a tuple can be applied to a function, however I did not show how member-wise transformations could be done.

The code of this article builds on the code of the prior.

Zipping.

If we have several tuples, what if we want to apply a function to the nth element of each one?

template< template<size_t> class Fi = Get, size_t i,
          class F, class ...T >
constexpr auto zipRow( const F& f, T&& ...t )
    -> decltype( f(Fi<i>()(std::forward<T>(t))...) )
{
    return f( Fi<i>()( std::forward<T>(t) )... );
}

Not hard at all! It basically squishes that row (thinking of t as a column and ti... as a row), using f, into one value. Now, let's say we want to zip together t... into one tuple.

template< template<size_t> class Fi = Get, size_t ...i,
          class Ret, class F, class ...T >
constexpr auto zipIndexList( IndexList<i...>, 
                             const Ret& r, const F& f, T&& ...t )
    -> decltype( r(zipRow<Fi,i>(f,std::forward<T>(t)...)...) )
{
    return r( zipRow< Fi, i >( f, std::forward<T>(t)... )... );
}

template< template<size_t> class Fi = Get,
          class Ret, class F, class T, class ...U,
          class _T = typename std::decay<T>::type,
          class IL = typename IListFrom<_T>::type >
constexpr auto zipTupleTo( const Ret& r, const F& f, T&& t, U&& ...u )
    -> decltype( zipIndexList<Fi>(IL(),r,f,std::forward<T>(t),std::forward<U>(u)...) )
{
    return zipIndexList<Fi>( IL(), r, f, std::forward<T>(t), std::forward<U>(u)... );
}

int main() {
    auto zipped = zipTupleTo( tuple, std::plus<int>(), tuple(1,10), 
                                                       tuple(2,20) );
    std::cout << " 1 +  2 = " << std::get<0>(zipped) << std::endl;
    std::cout << "10 + 20 = " << std::get<1>(zipped) << std::endl;
}

In zipIndexList, r represents the function defining how the output is returned. tuple (gist), from the previous article, is just a function object form of std::make_tuple that can be passed to higher order functions. By supplying it as our r, we're saying "just make it a tuple again."

Since most often, we want to zip back into a tuple, it makes sense to define zipTuple like so:

template< template<size_t> class Fi = Get,
          class F, class ...T >
constexpr auto zipTuple( const F& f, T&& ...t )
    -> decltype( zipTupleTo<Fi>(tuple,f,std::forward<T>(t)...) )
{
    return zipTupleTo<Fi>( tuple, f, std::forward<T>(t)... );
}

zipTuple is to tuples what std::transform is to sequences. The drawback of std::transform is that it only allows for a unary transformation or binary. Let's write a version that accepts any number of arguments.

// We require these polymorphic function objects.
constexpr struct Inc {
    template< class X >
    constexpr X operator () ( X x ) { return ++x; }
} inc{};

constexpr struct Eq {
    template< class X >
    constexpr bool operator () ( const X& a, const X& b ) 
    { return a == b; }
} eq{};

struct Or {
    template< class X >
    constexpr bool operator () ( const X& a, const X& b ) 
    { return a || b; }
};

// Wrapper to dereference arguments before applying.
// indirect : (a -> b) -> (a* -> b)
template< class F > struct Indirect {
    F f = F();

    constexpr Indirect( F f ) : f(std::move(f)) { }

    template< class ...It >
    constexpr auto operator () ( It ...it )
        -> decltype( f(*it...) )
    {
        return f( *it... );
    }
};

template< class F, class I = Indirect<F> > 
constexpr I indirect( F f ) {
    return I( std::move(f) );
}

#include <vector>
#include <algorithm>
template< class F, class ...X,
          class Result = typename std::result_of<F(X...)>::type,
          class Ret = std::vector<Result> >
Ret transform( const F& f, const std::vector<X>& ...vs )
{
    Ret r;

    const auto ends = tuple( vs.end()... );

    // Iterate through each vector in parallel.
    for( auto its  = tuple( vs.begin()... ); 
         // This unrolls to: not (it0==end0 || it1==end1 || ...)
         not foldl( Or(), zipTuple(eq,its,ends) );
         // Increment each iterator.
         its = zipTuple( inc, its ) )
    {
        r.emplace_back (
            applyTuple( indirect(f), its )
        );
    }

    return r;
}

int main() {
    std::vector<int> v = {1,10,100},
                     w = {2,20,200},
                     x = {3,30,300};

    auto vw = transform (
        [](int x, int y, int z){ return x+y+z; }, 
        v, w, x 
    );
    std::cout << "  1 +   2 +   3 = " << vw[0] << std::endl;
    std::cout << " 10 +  20 +  30 = " << vw[1] << std::endl;
    std::cout << "100 + 200 + 300 = " << vw[2] << std::endl;
}

Note: foldl (gist).

Mapping.

Suppose we want to know the results of adding every combination of {1,2,3} with {9,8,7}. We could write a function that cross-applied every variable from each tuple, but slightly more generally, we can start by taking the Cartesian product.

// product : {x,y} x {a,b} -> {{x,a},{x,b},{y,a},{y,b}}
constexpr struct Product {
    // {...xi...} x {...aj...} -> {xi,aj}
    template< size_t i, size_t j, class T, class U >
    static constexpr auto zip( const T& t, const U& u )
        -> decltype( tuple(std::get<i>(t),std::get<j>(u)) )
    {
        return tuple( std::get<i>(t), std::get<j>(u) );
    }

    // {...xi...} x {a0,a1,a2...} -> { {xi,a0}, {xi,a1}, ... }
    template< size_t i, size_t ...j, class T, class U >
    static constexpr auto withI( IndexList<j...>, const T& t, const U& u )
        -> decltype( tuple(zip<i,j>(t,u)...) )
    {
        return tuple( zip<i,j>(t,u)... );
    }
        
    // {x...} x {a...} -> { {x,a}... }
    template< size_t ...i, size_t ...j, class T, class U >
    static constexpr auto withIndexes( IndexList<i...>, IndexList<j...> js,
                                       const T& t, const U& u )
        -> decltype( std::tuple_cat(withI<i>(js,t,u)...) )
    {
        return std::tuple_cat( withI<i>(js,t,u)... );
    }

    template< class T, class U,
              class IL  = typename IListFrom<T>::type,
              class IL2 = typename IListFrom<U>::type >
    constexpr auto operator () ( const T& t, const U& u )
        -> decltype( withIndexes(IL(),IL2(),t,u) )
    {
        return withIndexes( IL(), IL2(), t, u );
    }
} product{};

We can now define a map operation to apply the product.

template< class F > struct ApplyF {
    F f = F();

    constexpr ApplyF( F f ) : f(std::move(f)) { }

    template< class T >
    constexpr auto operator () ( T&& t ) 
        -> decltype( applyTuple(f,std::forward<T>(t)) )
    {
        return applyTuple( f, std::forward<T>(t) );
    }
};

template< class F > 
constexpr ApplyF<F> applyF( F f ) {
    return ApplyF<F>(std::move(f));
}

constexpr struct MapTuple {
    template< class F, class T, class U >
    constexpr auto operator () ( const F& f, const T& t, const U& u )
        -> decltype( zipTuple(applyF(f),product(t,u)) )
    {
        return zipTuple( applyF(f), product(t,u) );
    }
} mapTuple{};

int main() {
    auto sums = mapTuple( std::plus<int>(), tuple(1,2,3), tuple(7,8,9) );
    std::cout << "map (+) (1,2,3) (7,8,9) = ";
    forEach( printItem, sums );
    std::cout << std::endl;
}

This prints out:

map (+) (1,2,3) (7,8,9) = 8 9 10 9 10 11 10 11 12

Zipping applies elements across from each other. Mapping applies everything to everything. (Note: a unary definition of map would be equivalent to a unary definition of zip.)

Tuples as function environments.

This might seem a little off topic, but Haskell has this neat function, id. It works like this:

id x = x

Simple, right?

(id f) x y = f x y = id f x y

id has this neat property that if applied multiple arguments, it applies the tail arguments to the first. This is an artifact of Haskell's curried notation, but we can emulate this behaviour:

constexpr struct Id {
    template< class X >
    constexpr X operator () ( X&& x ) {
        return std::forward<X>(x);
    }

    template< class F, class X, class ...Y >
    constexpr auto operator () ( const F& f, X&& x, Y&& ...y )
        -> typename std::result_of< F(X,Y...) >::type
    {
        return f( std::forward<X>(x), std::forward<Y>(y)... );
    }
} id{};

And now tuples take on a new role: Contained function environments. Consider:

 applyTuple( id, tuple(std::plus<int>(),1,2) );

What does this output? How about

 mapTuple( id, tuple(inc,dec), tuple(5,9) );

auto pm = tuple(std::plus<int>(),std::minus<int>());
 zipTuple( id, pm, tuple(10,5), tuple(10,5) );

Or:

 mapTuple( id, pm, tuple(1,2), tuple(3,4);

I leave implementing the three-tuple version of mapTuple as an exercise, but here's a hint: cross( cross({f},{x}), {y}) = {{{f,x},{y}}}, but you need to take it from that to {{f,x,y}}. (Another good exercise might be to write zipTuple in terms of transposition (wiki).)

Conclusions.

This flushes out some basic applications of tuples to functions. applyTuple unpacks a tuple and applies it to a function. foldl and foldr let one apply binary functions to nary tuples, or even singletons (maths concept, not design pattern). zipTuple transforms multiples tuples by a functions, member-wise. mapTuple performs a function for every combination of arguments.

Tuples have unusual mathematical properties compared to other data structures due to the profundity of what they generalize. They can help us shorthand functions to operate in parallel (zip), be passed around as partial or complete function environments, control variadic template parameters, and much, much more that I have either not discussed or yet realized.

One use I haven't discussed, for example, is as a relation, but for an example of this use of tuples, look no further than std::map.

I hope this post has been interesting. Happy coding!

Source for this article: https://gist.github.com/4268029

Fun with tuples.

2012-12-10T21:37:00.000-08:00

std::tuple is an odd-but-fun part of the new standard. A lot can be done with them. The members of a tuple can be applied to functions, both in groups and individually. The tuple itself can be treated as a function environment. They can be reorganized, appended, and truncated. The list goes on.

The thing is, what can tuples be used for? Any POD struct or class can be a tuple instead, although this may or may not be desirable. Still, we can write generic algorithms with tuples, whereas we cannot with structs. If we wrote a function that printed any tuple, then any POD we turned into a tuple would suddenly become printable.

Indexing.

Tuples are indexed, which makes accessing them odd.

// A normal struct
struct Vec { int x, y; };
Vec a = { 1, 2 };
a.x = 2; // Normal access.

std::tuple<int,int> b(1,2);
std::get<0>(b) = 2; // Weird access.

One might think "I'll just write an accessor function!", and that certainly would work. It becomes easier if std::get is made into a function object.

// std::tuple_element<i,T> does not perfect forward.
template< size_t i, class T >
using Elem = decltype( std::get<i>(std::declval<T>()) );

template< size_t i > struct Get {
    template< class T >
    constexpr auto operator () ( T&& t ) 
        -> Elem< i, T >
    {
        return std::get<i>( std::forward<T>(t) );
    }
};

constexpr auto getx = Get<0>();
constexpr auto gety = Get<1>();

getx(b) = 2; // Less weird.

I define Elem because using std::tuple_element returns what the tuple actually holds. std::get<0>(b) would return an int&, but tuple_element<0,decltype(b)>::type would be int.

One might find it useful to index the tuple backwards, so let's define a function, rget.

 template< size_t i, class T, 
          class _T = typename std::decay<T>::type,
          size_t N = std::tuple_size<_T>::value - 1, // Highest index
          size_t j = N - i >
constexpr auto rget( T&& t ) 
    -> Elem< j, T >
{
    return std::get<j>( std::forward<T>(t) );
}

Now we can also define functions to get the first and last elements of any tuple.

constexpr auto head = Get<0>();
constexpr auto last = RGet<0>(); // RGet is defined similarly to Get.

int main() {
    constexpr auto t = std::make_tuple( 1, 'a', "str" );
    std::cout << "head = " << head(t) << std::endl; // prints 1
    std::cout << "last = " << last(t) << std::endl; // prints str
}

Just for fun, let's also write a function that, if the index is too high, wraps around to the begining.

template< size_t i, class T, 
          class _T = typename std::decay<T>::type,
          size_t N = std::tuple_size<_T>::value,
          size_t j = i % N >
constexpr auto mod_get( T&& t ) 
    -> Elem< j, T >
{
    return std::get<j>( std::forward<T>(t) );
}

Now, let's say we want to call a function for every member of a tuple. We need a way of indexing it and applying some function for each index. It starts with the definition of a type to "hold" a list of indexes.

template< size_t ...i > struct IndexList {};

Now, if given a tuple of size 3, we want an IndexList<0,1,2> to represent each index. There are many solutions for how to do this, but they all have in common being obtuse or difficult to understand. The following solution is designed first and foremost to be obvious and intuitive.

template< size_t ... > struct EnumBuilder;

// Increment cur until cur == end.
template< size_t end, size_t cur, size_t ...i >
struct EnumBuilder< end, cur, i... > 
    // Recurse, adding cur to i...
    : EnumBuilder< end, cur+1, i..., cur >
{
};

// cur == end; the list has been built.
template< size_t end, size_t ...i >
struct EnumBuilder< end, end, i... > {
    using type = IndexList< i... >;
};

template< size_t b, size_t e >
struct Enumerate {
    using type = typename EnumBuilder< e, b >::type;
};

template< class > struct IListFrom;

template< class ...X > 
struct IListFrom< std::tuple<X...> > {
    static constexpr size_t N = sizeof ...(X);
    using type = typename Enumerate< 0, N >::type;
};

Now, a function that applies each index, and one that prints a tuple's elements for testing.

template< size_t i, size_t ...j, class F, class T >
void forEachIndex( IndexList<i,j...>, const F& f, const T& t ) {
    f( std::get<i>(t) );
    
    // Recurs, removing the first index.
    forEachIndex( IndexList<j...>(), f, t ); 
}

template< class F, class T >
void forEachIndex( IndexList<>, const F& f, const T& t ) {
    // No more indexes.
}

template< class F, class T >
void forEach( const F& f, const T& t ) {
    constexpr size_t N = std::tuple_size<T>::value;
    using IL = typename Enumerate<0,N>::type;
    forEachIndex( IL(), f, t );
}

constexpr struct PrintItem {
    template< class X >
    void operator () ( const X& x ) const {
        std::cout << x << ' ';
    }
} printItem{};

int main() {
    constexpr auto t = std::make_tuple( 1, "and", 2 );
    std::cout << "t = "; 
    forEach( printItem, t ); // Prints "1 and 2"
    std::cout << std::endl;
}

Applying a tuple to a function.

This is probably one of the most pondered questions about tuples. "How do I apply one to a function?" This is easy since we already have IndexList defined.

template< size_t ...i, class F, class T >
constexpr auto applyIndexList( IndexList<i...>, F f, T&& t )
    -> typename std::result_of< F( Elem<i,T>... ) >::type
{
    return f( std::get<i>(std::forward<T>(t))... );
}

template< class F, class T,
          class _T = typename std::decay<T>::type;
          class IL = typename IListFrom<_T>::type >
constexpr auto applyTuple( const F& f, T&& t )
    -> decltype( applyIndexList(IL(),f,std::forward<T>(t)) )
{
    return applyIndexList( IL(), f, std::forward<T>(t) );
}

// Functional programmers may recognize this as cons.
constexpr struct PushFront {
    template< class ...X, class Y >
    constexpr auto operator () ( std::tuple<X...> t, Y y )
        -> std::tuple< Y, X... >
    {
        return std::tuple_cat( tuple(std::move(y)), std::move(t) );
    }
} pushFront{};

// Chain Left.
constexpr struct ChainL {
    template< class F, class X >
    constexpr X operator () ( const F&, X x ) {
        return x;
    }

    template< class F, class X, class Y, class ...Z >
    constexpr auto operator () ( const F& b, const X& x, const Y& y, const Z& ...z) 
        -> decltype( (*this)(b, b(x,y), z... ) )
    {
        return (*this)(b, b(x,y), z... );
    }
} chainl{};

// Fold Left.
constexpr struct FoldL {
    // Given f and {x,y,z}, returns f( f(x,y), z ).
    template< class F, class T >
    constexpr auto operator () ( const F& f, const T& t ) 
        -> decltype( applyTuple(chainl,pushFront(t,f)) )
    {
        return applyTuple( chainl, pushFront(t,f) );
    }
} foldl{};

auto ten = foldl( std::plus<int>(), std::make_tuple(1,2,3,4) );

We can call applyIndexList with different index lists to get interesting results.

// Because std::make_tuple can't be passed 
// to higher order functions.
constexpr struct MakeTuple {
    template< class ...X >
    constexpr std::tuple<X...> operator () ( X ...x ) {
        return std::tuple<X...>( std::move(x)... );
    }
} tuple{}; // function tuple that construct std::tuples.

// Returns the initial elements. (All but the last.)
// init( {1,2,3} ) = {1,2}
template< class T,
          size_t N = std::tuple_size<T>::value, 
          class IL = typename Enumerate< 0, N-1 >::type >
constexpr auto init( const T& t )
    -> decltype( applyIndexList(IL(),tuple,t) )
{
     // Construct a new tuple from the initial indexes.
     return applyIndexList( IL(), tuple, t );
}

// Returns a new tuple with every value from t except the first.
// tail( {1,2,3} ) = {2,3}
template< class T,
          size_t N = std::tuple_size<T>::value, 
          class IL = typename Enumerate< 1, N >::type >
constexpr auto tail( const T& t )
    -> decltype( applyIndexList(IL(),tuple,t) )
{
    return applyIndexList( IL(), tuple, t );
}

Remember Get and RGet from above? They're templated function objects based on an index. We can write a more generic applyIndexList that allows specifying such a function and without losing the default behaviour.

template< template<size_t> class Fi = Get, size_t ...i, class F, class T >
constexpr auto applyIndexList( IndexList<i...>, const F& f, T&& t )
    -> typename std::result_of< F( 
        typename std::result_of< Fi<i>(T) >::type...
    ) >::type
{
    return f( Fi<i>()(std::forward<T>(t))... );
}

template< template<size_t> class Fi = Get, class F, class T,
          class _T = typename std::decay<T>::type,
          class IL = typename IListFrom<_T>::type >
constexpr auto applyTuple( const F& f, T&& t )
    -> decltype( applyIndexList<Fi>(IL(),f,std::forward<T>(t)) )
{
    return applyIndexList<Fi>( IL(), f, std::forward<T>(t) );
}

// Reconstruct t in reverse.
template< class T >
constexpr auto reverse( const T& t ) 
    -> decltype( applyTuple<RGet>(tuple,t) )
{
    return applyTuple< RGet >( tuple, t );
}

// Fold Right.
constexpr struct FoldR {
    // Given f and {x,y,z}, returns f( f(z,y), x ).
    template< class F, class T >
    constexpr auto operator () ( const F& f, const T& t ) 
        -> decltype( foldl(f,reverse(t)) )
    {
        return foldl( f, reverse(t) );
    }
} foldr{};

This leaves us with two ways of transforming tuples: modifying the index list and defining an alternative get function. foldr and foldl give us a way to fold a tuple into one value.

Tuples and functions.

Perhaps we have a function that takes a tuple, but its arguments are not tuples. Haskell defined a function, curry, to transform the function from a pair-taking one to a two-argument function. Haskell does not have the same expressiveness with variadic types, so they can't write this more general C++ version.

// curry : ( (a,b...) -> c ) x a x b... -> c
template< class F, class ...X >
constexpr auto curry( const F& f, X&& ...x )
    -> decltype( f( std::forward_as_tuple(std::forward<X>(x)...) ) )
{
    return f( std::forward_as_tuple(std::forward<X>(x)...) );
}

// Pair Sum.
// psum : (int,int) -> int
unsigned int psum( std::tuple<int,int> p ) {
    return head(p) + last(p);
}

auto five = curry( psum, 3, 2 );

Haskell also defines uncurry, which is the same as applyTuple. The most important distinction between Haskell's curry and uncurry and this is that Haskell's curry is a unary higher order function, whereas this is binary. However, the two are the same if one considers the unary version a partial application of the binary one.

Tuples can be used as a sort of partial application on their own. One might store some values in a tuple, and add more arguments later to be applied to some function. For a somewhat contrived example, consider the following:

constexpr struct PushBack {
    template< class ...X, class Y >
    constexpr auto operator () ( std::tuple<X...> t, Y y )
        -> std::tuple< X..., Y >
    {
        return std::tuple_cat( std::move(t), tuple(std::move(y)) );
    }
} pushBack{};

#include <cmath>

// Quadratic root.
constexpr struct QRoot {
    using result = std::tuple<float,float>;

    result operator () ( float a, float b, float c ) const {
        float root = std::sqrt( b*b - 4*a*c );
        float den  = 2 * a;
        return result( (-b+root)/den, (-b-root)/den );
    }
} qroot{};

std::ostream& operator << ( std::ostream& os, const QRoot::result r ) {
    return os << std::get<0>(r) << " or " << std::get<1>(r);
}

int main() {
    auto ab = std::make_tuple( 1, 3 );
    auto qroot_ab = [&] ( float c ) {
        return applyTuple( qroot, pushBack(ab,c) );
    };
    std::cout << "qroot(1,3,-4) = " << qroot_ab(-4) << std::endl;
    std::cout << "qroot(1,3,-5) = " << qroot_ab(-5) << std::endl;

    auto bc = std::make_tuple( 3, -4 );
    auto qroot_bc = [&] ( float a ) {
        return applyTuple( qroot, pushFront(bc,a) );
    };
    std::cout << "qroot(1,3,-4) = " << qroot_bc(1) << std::endl;
    std::cout << "qroot(1,3,-5) = " << qroot_bc(2) << std::endl;
}

One persuasive use-case of tuples is being able to freely manipulate variadic parameters.

template< class ...X >
constexpr auto third_arg( X&& ...x )
    -> Elem< 2, std::tuple<X...> >
{
    return std::get<2>( std::forward_as_tuple(std::forward<X>(x)...) );
}

Variadic parameters can be forwarded into a tuple, meaning anything we can do with tuples, we can do with variadic parameters. Arguments can be compacted into tuples, modified, reordered, and re-expanded into functions. One annoyance with the ... is that it eats up everything to the right. The following is ill-formed.

template< class ...X, class F >
constexpr auto backwards( X&& ...x, const F& f )
    -> typename std::result_of< F(X...) >::type
{
    return f( std::forward<X>(x)... )
}

I leave the solution as an exercise for the reader.

Conclusions.

This article is introductory, at best. I must admit I have much less practice with them than other C++11 features, but this seems to be true of GCC, too! Attempting to compile the following will cause an internal error since at least 4.7.

#include <tuple>

template< class X > struct Hold {
 X x;
 constexpr Hold( X x ) : x(std::move(x)) { }
};

constexpr auto a = Hold<int>( 1 ); //ok
auto b = Hold<std::tuple<char>>( std::tuple<char>('c') ); // not
constexpr auto c = Hold<std::tuple<char>>( std::tuple<char>('c') ); // not

int main() {}

I believe it's quite possible that whole programs could be written with tuples and basic data types, whether or not it should be preferred. We could look at them as a general utility class, but I think it would be appropriate to see them as lightweight, generalized, composable, structs. I plan on writing a follow-up to cover applying multiple tuples, transforming tuples by functions, and a few other things. If anyone has any interesting use-cases or neat tricks for tuples, I'd love to hear about it in the comments.

The next article covers element-wise application, applying multiple tuples, and more: "Zipping and Mapping Tuples".

Source code: https://gist.github.com/4256092
Tuple reference: http://en.cppreference.com/w/cpp/utility/tuple

GCC 4.8 Has Automatic Return Type Deduction.

2012-12-09T08:55:00.000-08:00

Note: GCC 4.8 is still in development; this article is based on Ubuntu's snapshot package of 4.8. I do not know about availability on other platforms. I say "has" because it does work and code can be written using it right now, even if it's in testing.

Update: It turns out this feature has been implemented to test n3386. You can read the discussion and even see the patch on the mailing list: http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01599.html

Are your two favourite C++11 features decltype and declval? I have mixed feelings. On one hand, it lets me write code like this

template< class X, class Y >
constexpr auto add( X&& x, Y&& y )
    -> decltype( std::declval<X>() + std::declval<Y>() )
{
    return std::forward<X>(x) + std::forward<Y>(y);
}

and know that it will work on any type and do the optimal thing if x or y should be moved or copied (like if X=std::string). On the other hand, it's tedious. "forward" and "declval" are both seven letter words that have to be typed out every time for every function, per variable. Then there's the std:: prefix and <X>(x) suffix. The only benefit of using declval over forward is a savings of one letter not typed.

But someone must have realized there's a better way. If the function is only one statement, and the return type is the declval of that statement, couldn't the compiler just figure out what I mean when I say this?

template< class X, class Y >
constexpr auto add( X&& x, Y&& y ) {
    return std::forward<X>(x) + std::forward<Y>(y);
}

March of this year, one Jason Merrill proposed just this (n3386) and GCC has already implemented it in 4.8 (change log), though it requires compiling with -std=c++1y. One can play with 4.8 on Ubuntu with gcc-snapshot. (Note that it doesn't modify your existing gcc install(s) and puts it in /usr/lib/gcc-snapshot/bin/g++. Also, I have been unable to install any but the third-to-most recent package.) I hope it is not too much trouble to install on other distros/platforms.

So if your favourite c++11 feature is decltype and declval, prepare to never use them again. The compiler can deduce the type for you implicitly, and better, and it works even if the function is longer than one line. Take for example, reasonably complex template functions like the liftM function I wrote for "Arrows and Keisli":

constexpr struct LiftM {
    template< class F, class M, class R = Return<typename std::decay<M>::type> >
    constexpr auto operator () ( F&& f, M&& m )
        -> decltype( std::declval<M>() >>= compose(R(),std::declval<F>()) )
    {
        return std::forward<M>(m) >>= compose( R(), std::forward<F>(f) );
    }

    template< class F, class A, class B >
    constexpr auto operator () ( F&& f, A&& a, B&& b )
        -> decltype( std::declval<A>() >>= compose (
                rcloset( LiftM(), std::declval<B>() ),
                closet(closet,std::declval<F>())
            ) )
    {
        return std::forward<A>(a) >>= compose (
            rcloset( LiftM(), std::forward<B>(b) ),
            closet(closet,std::forward<F>(f))
        );
    }
} liftM{};

Could be written instead:

constexpr struct LiftM {
    template< class F, class M >
    constexpr auto operator () ( F&& f, M&& m ) {
        using R = Return< typename std::decay<M>::type >;
        return std::forward<M>(m) >>= compose( R(), std::forward<F>(f) );
    }

    template< class F, class A, class B >
    constexpr auto operator () ( F&& f, A&& a, B&& b ) {
        return std::forward<A>(a) >>= compose (
            rcloset( LiftM(), std::forward<B>(b) ),
            closet(closet,std::forward<F>(f))
        );
    }
} liftM{};

Automatic type deduction didn't exactly make this function more obvious or simple, but it did remove the visual cruft and duplication of the definition. Now, if I improve this function to make it more clear, I won't have a decltype expression to have to also edit.

To be fair, this doesn't entirely replace decltype. auto doesn't perfect forward. But it seems to work as expected, most of the time.

For another example of the use-case of auto return type deduction, consider this program:

#include <tuple>

int main() {
    auto x = std::get<0>( std::tuple<>() );
}

This, small, simple, and obviously wrong program generates an error message 95 lines long. Why? GCC has to check make sure this isn't valid for the std::pair and std::array versions of get, and when it checks the tuple version, it has to instantiate std::tuple_element recursively to find the type of the element. It actually checks for the pair version first, so one has to search the message for the obviously correct version and figure out why it failed. A simple one-off bug in your program could cause a massive and difficult to parse error message. We can improve this simply.

#include <tuple>

template< unsigned int i, class T >
auto get( T&& t ) {
    using Tup = typename std::decay<T>::type;
    static_assert( i < std::tuple_size<Tup>::value,
                   "get: Index too high!" );
    return std::get<i>( std::forward<T>(t) );
}

int main() {
    int x = get<0>( std::tuple<>() );
}

How much does this shrink the error by? Actually, it grew to 112 lines, but right at the top is

auto.cpp: In instantiation of 'auto get(T&&) [with unsigned int i = 0u; T = std::tuple<>]':
auto.cpp:13:36:   required from here
auto.cpp:7:5: error: static assertion failed: get: Index too high!

The error message might be a little bigger, but it tells you right off the bat what the problem is, meaning one has less to parse.

Similar to this static_assert example, typedefs done as default template arguments can be moved to the function body in many cases.

template< class X, class Y = A<X>, class Z = B<Y> >
Z f( X x ) {
    Z z;
    ...
    return z;
}

can now be written

template< class X > // Simpler signature.
auto f( X x ) {
    using Y = A<X>; // Type instantiation logic
    using Z = B<Y>; // done in-function.
    Z z;
    ...
    return z;
}

Looking forward.

This release of GCC also implements inheriting constructors, alignas, and attribute syntax. It also may have introduced a few bugs; for example, my library, which compiles with 4.7, does not with 4.8, producing many undefined references.

The other features of this release might not be quite so compelling, but automatic type deduction alone is one powerful enough to change the way coding is done in C++--again. Heavily templated code will become a breeze to write and maintain as figuring out the return type is often the hardest part. I find it encouraging that this has been implemented so quickly. Of coarse, it's not standard, at least not yet.

On a final note, I wonder how this will interact with the static if. It would be nice, were the following well-formed:

template< class X >
auto f( X x ) {
    if( std::is_same<int,X>::value )
        return "Hi"; // Int? Return string.
    else
        return x / 2; // Other? halve.
}

std::move and lambda? It's just partial application!

2012-12-01T06:26:00.000-08:00

So, "Learn how to capture by move" was posted on isocpp.org and then this morning, "Another Alternative to lambda move capture" was on reddit. These are both impressive solutions and I look forward to seeing what other creative answers the C++ community will come up with. I also want to go over how this problem can be solved with partial application.

First, here's the definition.

template< class F, class X >
struct Part {
    F f;
    X x;

    template< class _F, class _X >
    constexpr Part( _F&& f, _X&& x )
        : f(forward<_F>(f)), x(forward<_X>(x))
    {
    }

    template< class ... Xs >
    auto operator () ( Xs&& ...xs ) 
        -> decltype( f(x,declval<Xs>()...) )
    {
        return f( x, forward<Xs>(xs)... );
    }
};

template< class F, class X >
constexpr Part<F,X> closet( F f, X x ) {
    return Part<F,X>( std::move(f), std::move(x) );
}

So, there's this thing called the upward funarg problem. Let's say we wrote a function like this:

std::function<std::vector<int>()> bad_originate() {
    std::vector<int> v = { 1, 2, 3, 4, 5 };
    return [&]{ return v; };
}
 
auto bad = bad_originate();
bad(); // Runtime error!

This fails because the reference to v becomes invalidated when it goes out of scope. We could copy v into the lambda, but our goal is a move.

So can we fix this with partial application? Let's add a bit to the example by including a function to partially apply.

std::vector<int> add_all( std::vector<int>& v, int x ) {
    // Arbitrarily move v.
    auto w = std::move(v);

    std::transform( w.begin(), w.end(), w.begin(),
                    closet(std::plus<int>(),x) );

    return w;
}

using Origin = Part<decltype(add_all)*,std::vector<int>>;
Origin originate() {
    std::vector<int> v = { 1, 2, 3, 4, 5 };
    return closet( add_all, std::move(v) );
}

And it ain't hard at all.

Origin f = originate();
print_vec( "original : ", f.x );
print_vec( "added 10 : ", f(10) );
print_vec( "original : ", f.x );

will output

    original : = 1 2 3 4 5
    added 10 : = 11 12 13 14 15
    original : =

Maybe one would prefer that add_all take its argument by value; now, composition becomes useful.

template< class F, class G > struct Composition {
    F f = F();
    G g = G();

    Composition() { }
    Composition( F f, G g) 
        : f(move(f)), g(move(g)) { }

    template< class X, class ...Y >
    auto operator () ( X&& x, Y&& ...y ) 
        -> decltype( f(g(declval<X>()), declval<Y>()...) )
    {
        return f( g( forward<X>(x) ), forward<Y>(y)... );
    }
};

template< class F, class G > 
Composition<F,G> comp( F f, G g ) {
    return Composition<F,G>( std::move(f), std::move(g) );
}

constexpr struct Mover {
    template< class X >
    X&& operator () ( X& x ) const {
        return std::move(x);
    }
} mover{}; 

std::vector<int> add_all2( std::vector<int> w, int x ) {
    std::transform( w.begin(), w.end(), w.begin(),
                    closet(std::plus<int>(),x) );

    return w;
}
 
using MoveAddAll = decltype( comp(add_all2,mover) );
using Origin2 = Part< MoveAddAll, std::vector<int> >;
Origin2 originate2() {
    std::vector<int> v = { 1, 2, 3, 4, 5 };
    return Origin2( comp(add_all2,mover), std::move(v) );
}

and

Origin2 g = originate2();
print_vec( "original : ", g.x );
print_vec( "added 10 : ", g(10) );
print_vec( "original : ", g.x );

will output the same as above.

The code to this mini-article can be found here: https://gist.github.com/4182570

Arrows and Kleisli in C++

2012-11-30T06:34:00.002-08:00

Control.Arrow shows an odd-but-interesting part of Haskell. Arrows are functions; composable and callable. Arrows and Monads sometimes seem to be referred to as alternatives to each other. Perhaps now would be a good time to relate some category theory.

From A to B.

A category theorist might view functions as a transformation from one type to another. For example,

std::to_string : X -> std::string

would mean that to_string is a function that maps an X to a string.

In "Generic Function Objects", I talked about type constructors.

/* MakeT T : X -> T<X> */
template< template<class...> class T > struct MakeT {
    template< class ...X, class R = T< typename std::decay<X>::type... > >
    constexpr R operator () ( X&& ...x ) {
        return R( std::forward<X>(x)... );
    }
};

/* pair : X x Y -> std::pair<X,Y> */
constexpr auto pair = MakeT<std::pair>();

I use the notation X x Y to mean that MakeT<pair> takes two arguments. Though, in Haskell, it would actually look like this:

make_pair :: X -> Y -> (X,Y)

Haskell uses curried notation.

g : pair<X,A> -> pair<X,B>

Here, g is a function that transforms the second element of the pair to a B, but leaves the first as an X. Although this cannot be inferred from the definition, let's assume that the first value (X) is not transformed in any way. There is a function that represents this non-transformation.

/* id : X -> X */
constexpr struct Id {
    template< class X >
    constexpr X operator () ( X&& x ) {
        return std::forward<X>(x);
    }
} id{};

Arrows provide the tools to take a normal function, f : A -> B, and convert it to a function like g. This is sort of like composition

/* compose : (B -> C) x (A -> B) -> (A -> C) */
template< class F, class G >
struct Composition
{
    F f; G g;

    template< class _F, class _G >
    constexpr Composition( _F&& f, _G&& g ) 
        : f(std::forward<_F>(f)), g(std::forward<_G>(g)) { }

    template< class X >
    constexpr auto operator() ( X&& x) 
        -> decltype( f(g(std::declval<X>())) )
    {
        return f( g( std::forward<X>(x) ) );
    }
};

constexpr auto compose = MakeT<Composition>();

If we look at A -> B as a type itself, say AB, and B -> C as a type, BC, then it is clear that composition is BC x AB -> AC. Functions themselves are values with types and composition creates a new value and type. We can think of them as being on a geometric plain with the points A, B, and C. Functions are connections from point-to-point. Similarly, we can think of AB, BC, and AC as points on a different plain with arrows connecting AB and BC to AC (though AB and BC cannot be connected).

Functions as Arrows.

Arrows have several operations. The first is arr, which transforms a function to an arrow; but since functions are arrows, it isn't useful, right off the bat. first and second take functions, A -> B, and lift them to pair-oriented functions. fan takes two functions, one A -> B, another A -> C, and returns a function to pair<B,C>. split takes two functions as well, A -> B and X -> Y, and returns a function pair<A,X> -> pair<B,Y>.

 arr : (A -> B) -> (A -> B)
 first : (A -> B) -> (pair<A,X> -> pair<B,X>)
 second : (A -> B) -> (pair<X,A> -> pair<X,B>)
 split : (A -> B) x (X -> Y) -> (pair<A,X> -> pair<B,Y>)
 fan : (A -> B) x (A -> C) -> (A -> pair<B,C>)

First, we fill in the declarations.

template< class A, class F, class Arr = Arrow< Cat<A> > >
constexpr auto arr( F&& f ) ->  decltype( Arr::arr( std::declval<F>() ) )
{
    return Arr::arr( std::forward<F>(f) );
}

template< class A > struct Arr {
    template< class F >
    constexpr auto operator () ( F&& f ) -> decltype( arr(std::declval<F>()) )
    {
        return arr( std::forward<F>(f) );
    }
};

constexpr struct Split {
    template< class F, class G, class A = Arrow<Cat<F>> >
    constexpr auto operator () ( F&& f, G&& g )
        -> decltype( A::split(std::declval<F>(), std::declval<G>()) )
    {
        return A::split( std::forward<F>(f), std::forward<G>(g) );
    }
} split{};

constexpr struct Fan {
    template< class F, class G, class A = Arrow<Cat<F>> >
    constexpr auto operator () ( F&& f, G&& g )
        -> decltype( A::fan(std::declval<F>(),std::declval<G>()) )
    {
        return A::fan( std::forward<F>(f), std::forward<G>(g) );
    }
} fan{};

constexpr struct First {
    template< class F, class A = Arrow<Cat<F>> >
    constexpr auto operator () ( F&& f ) 
        -> decltype( A::first(std::declval<F>()) ) 
    {
        return A::first( std::forward<F>(f) );
    }
} first{};

constexpr struct Second {
    template< class F, class A = Arrow<Cat<F>> >
    constexpr auto operator () ( F&& f ) -> decltype( A::second(std::declval<F>()) ) {
        return A::second( std::forward<F>(f) );
    }
} second{};

arr will be trivial to implement, but the others are tricky. Luckily, we can define it mostly in terms of split--it looks like this:

/* pairCompose : (A -> B) x (X -> Y) -> (pair<A,X> -> pair<B,Y>) */
template< class F, class G > struct PairComposition {
    F f; G g;

    template< class _F, class _G >
    constexpr PairComposition( _F&& f, _G&& g )
        : f(std::forward<_F>(f)), g(std::forward<_G>(g))
    {
    }

    template< class P/*air*/ >
    constexpr auto operator() ( const P& p ) 
        -> decltype( std::make_pair( f(std::get<0>(p)), g(std::get<1>(p)) ) )
    {
        return std::make_pair( f(std::get<0>(p)), g(std::get<1>(p)) );
    }
};

constexpr auto pairCompose = MakeT<PairComposition>();

pairCompose returns a function expecting a pair and returns a pair, threading the first value through f and the second through g. We can compose PairCompositions.

namespace std {
std::string to_string( const std::string& s );
template< class X, class Y >
std::string to_string( const std::pair<X,Y>& p );

std::string to_string( const std::string& s ) {
    return "\"" + s + "\"";
}

template< class X, class Y >
std::string to_string( const std::pair<X,Y>& p ) {
    return "(" + to_string(p.first) + "," + to_string(p.second) + ")";
}
}

constexpr struct ToString {
    template< class X >
    std::string operator () ( const X& x ) const {
        return std::to_string(x);
    }
} string{};

int main() {
    using std::cin;
    using std::cout;
    using std::endl;

    auto plus2 = []( int x ){ return x+2; };

    std::pair<int,int> p( 1, 1 );

    cout << "((+2) . string, (+4))( 1, 1 ) = " 
         << compose( pairCompose(string,plus2), 
                     pairCompose(plus2, plus2) )(p) << endl;
}

This will output ("3",5). It's easiest to look at this as p.first and p.second being on two separate paths. I have written it such that the individual paths are verticle. The first path starts at 1, and goes to plus2(1), to show(plus2(1)). The second path starts at 2 and ends at plus2(plus2(2)). The odd part is that we're defining both paths at once.

Observe that first(f) = split(f,id). Proof?

first : (A -> B) -> (pair<A,X> -> pair<B,X>)
split : (A -> B) x (X -> Y) -> (pair<A,X> -> pair<B,Y>)
id : X -> X
f : A -> B
split(f,id) : pair<A,X> -> pair<B,X>

Since we know first(f) = split(f,id), we can intuit that second(f) = split(id,f) and also, split(f,g) = compose( first(f), second(g) ).

fan represents a fork in the road. One variable gets fed into two functions, the results of which get zipped into a pair. duplicate will do the splitting, but we'll rely on split to implement fan.

constexpr struct Duplicate {
    template< class X, class P = std::pair<X,X> > 
    constexpr P operator() ( const X& x ) {
        return P( x, x );
    }
} duplicate{};

With this, we can say that fan(f,g)(x) = split(f,g)( duplicate(x) ). Since we know what everything looks like, we can define Arrow<F>.

template< class Func > struct Arrow<Func> {
    template< class F >
    static constexpr F arr( F&& f ) { return std::forward<F>(f); }

    /* split(f,g)(x,y) = { f(x), g(y) } */
    template< class F, class G >
    static constexpr auto split( F f, G g ) 
        -> PairComposition<F,G>
    {
        return pairCompose( std::move(f), std::move(g) );
    }

    /*
     * first(f)(x,y)  = { f(x), y }
     * second(f)(x,y) = { x, f(y) }
     */
     
    template< class F >
    static constexpr auto first( F f ) 
        -> decltype( split(std::move(f),id) )
    {
        return split( std::move(f), id );
    }

    template< class F >
    static constexpr auto second( F f ) 
        -> decltype( split(id,std::move(f)) )
    {
        return split( id, std::move(f) );
    }

    /* fan(f,g)(x) = { f(x), g(x) } */
    template< class F, class G >
    static constexpr auto fan( F f, G g ) 
        -> decltype( compose( split(std::move(f),std::move(g)), duplicate ) )
    {
        return compose( split(std::move(f),std::move(g)), duplicate );
    }
};

Now, we can rewrite the above example:

int main() {
    auto plus2 = []( int x ){ return x+2; };
 
     cout << "(+2) *** (+2) >>> string *** (+2) $ 1 = " 
         << compose( split(string, plus2), 
                     fan(  plus2,  plus2) )(1) << endl;
}

One way to think of Arrows is as paths. fan represents a fork in the road, split defines two separate, but parallel, paths at once, and first and second allow progress on one of the paths. For an arbitrary example, consider a branching path that hides some treasure. What sequence of moves are required to recover it?

/* 
 * Get 0 : (X,Y) -> X
 * Get 1 : (X,Y) -> Y
 */
template< size_t N > struct Get {
    template< class P >
    constexpr auto operator () ( P&& p )
        -> decltype( std::get<N>(std::declval<P>()) )
    {
        return std::get<N>( std::forward<P>(p) );
    }
};

int main() {
    constexpr auto fst = Get<0>();
    constexpr auto snd = Get<1>();
    constexpr auto oneHundred = []( int x ){ return 100; };
    
    // Hide the hundred.
    // hidden = pair( pair( 0, pair(100,0) ), 0 )
    auto hidden = fan( fan(id,fan(oneHundred,id)), id )( 0 );
    // Function to find it again.
    auto find = compose( fst, compose(snd,fst) );
    
    cout << "I found " << find(hidden) << "!" << endl;
}

Enter Kleisli.

This all works great for free functions, but some functions look like this:

f : X -> M<Y> -- Where M is some Monad

This f is a member of the Kleisli category. It accepts a normal value and returns a Monadic one. Examples of Kelisli functions (psuedocode):

Just : X -> unique_ptr<X>
Just = MakeT<unique_ptr>();

Seq : X -> std::vector<X>
Seq(x) = std::vector<X>{x}

mreturn<M> : X -> M<X>

unique_ptrs and vectors are monads, so any function that produces one from some x can be considered in the Kleisli category. Though, to the compiler, there's no logic to that, so we define a wrapper type around the function. This is a fairly common pattern, so I define a base class for it.

template< class F > struct Forwarder {
    using function = typename std::decay<F>::type;
    function f = F();

    template< class ...G >
    constexpr Forwarder( G&& ...g ) : f( std::forward<G>(g)... ) { }

    template< class ...X >
    constexpr auto operator() ( X&& ...x )
        -> decltype( f(std::declval<X>()...) )
    {
        return f( std::forward<X>(x)... );
    }

    constexpr operator function () { return f; }
};

The Kleisli itself can be implemented two ways. The Haskell way would be something like this:

/* Kleisli M A B : A -> M<B> */
template< template<class...> class M, class A, class B,
          class F = std::function<M<A>(B)> >
struct Kleisli : Forwarder<F> {
    template< class ...G >
    constexpr Kleisli( G&& ...g ) : Forwarder<F>(std::forward<G>(g)...) { }
};

This is faithful to how Haskell defines it.

newtype Kleisli m a b = Kleisli { runKleisli :: a -> m b }

But just think about this for a moment. A Kleisli is an Arrow, right? What would Arrow<Kleisli>::first return?

first : (A -> B) -> (pair<A,X> -> pair<B,X>)
Kleisli f = A -> M
first(Kleisli f) : (A -> M B) -> (pair<A,X> -> M pair<B,X>)

What's the type of X? It's truly impossible to know because it depends on what gets passed in, which is what the notation above means.

Is it impossible to define Kleisli in this way? I don't know. I attempted to specialize its composition function for when A or B were pair types, but there are four combinations of whether A or B or both is a pair. I tried assigning X the type of std::placeholders::_1, but none of my attempts to make it really work compiled. (It was horrible.)

But we don't have any of that trouble if we define Kleisli differently.

Kleisli<F>.

/* Kleisli M F : F -- where F : A -> M<B> */ 
template< template<class...> class M, class F = Id >
struct Kleisli : Forwarder<F> {
    template< class ...G >
    constexpr Kleisli( G&& ...g ) : Forwarder<F>(std::forward<G>(g)...) { }
};

An implicit requirement of arrows is that they can be composed, but Kleislies?

compseKleisli : Kleisli(B -> M<C>) x Kleisli(A -> M) -> (...?)

We can reasonably assume that it should be Kleisli(A -> M<C>), but our naive definition of composition must be specialized. Literally.

/* Composition : Kleisli(B -> M<C>) x Kleisli(A -> M<B>) -> (A -> M<C>) */
template< template<class...> class M, class F, class G >
struct Composition<Kleisli<M,F>,Kleisli<M,G>>
{
    Kleisli<M,F> f; 
    Kleisli<M,G> g;

    template< class _F, class _G >
    constexpr Composition( _F&& f, _G&& g ) 
        : f(std::forward<_F>(f)), g(std::forward<_G>(g)) { }

    template< class X >
    constexpr auto operator() ( X&& x ) 
     -> decltype( g(std::forward<X>(x)) >>= f )
    {
        return g(std::forward<X>(x)) >>= f;
    }
};

/* kcompose : Kleisli(B -> M<C>) x Kleisli(A -> M<B>) -> Kleisli(A -> M<C>) */
constexpr struct KCompose {
    template< template<class...> class M, class F, class G >
    constexpr auto operator () ( Kleisli<M,F> f, Kleisli<M,G> g )
        -> Kleisli< M, Composition<Kleisli<M,F>,Kleisli<M,G> >
    {
        return kleisli<M> ( 
            compose( std::move(f), std::move(g) )
        );
    }
} kcompose{};
 
int main() {
    auto stars = kleisli<std::basic_string> (
        [] (char c) -> std::string { 
            return c == '*' ? "++" :
                   c == '+' ? "* *" : std::string{c};
        }
    );
    
    auto starsSqr = kcompose( stars, stars );
    
    auto starsCube = kcompose( starsSqr, stars );
    
    cout << "stars of '*' : " << stars('*') << endl;
    cout << "stars^2 of '*' : " << starsSqr('*') << endl;
    cout << "stars^3 of '*' : " << starsCube('*') << endl;
}

This outputs

 stars of '*' : "++"
 stars^2 of '*' : "* ** *"
 stars^3 of '*' : "++ ++++ ++"

Like before, it would be most convenient to define Arrow<Kleisli> in terms of split. split(f,g), given the pair {x,y}, will have to pass x into f and y into g, both of which will return Monadic values. Finally, a pair will have to be constructed from the values extracted from f(x) and g(y).

 split: Kleisli (A -> M B) x Kleisli (X -> M Y) -> Kleisli (pair<A,X> -> M pair<B,Y>)

To extract the values from f(x) and g(y), we need to call mbind on each, which in Haskell might look like this:

 f(x) >>= (\x' -> g(y) >>= (\y' -> return (x,y)))

Or, Control.Monad defines liftM.

 liftM2 (,) f(x) g(y) -- where (,) is equivalent to make_pair

template< class M > struct Return {
    template< class X >
    constexpr auto operator () ( X&& x ) 
        -> decltype( mreturn<M>(std::declval<X>()) )
    {
        return mreturn<M>( std::forward<X>(x) );
    }
}; 
 
/*
 * liftM : (A -> B) x M<A> -> M<B>
 * liftM : (A x B -> C) x M<A> x M<B> -> M<C>
 */
constexpr struct LiftM {
    template< class F, class M, class R = Return<typename std::decay<M>::type> >
    constexpr auto operator () ( F&& f, M&& m )
        -> decltype( std::declval<M>() >>= compose(R(),std::declval<F>()) )
    {
        return std::forward<M>(m) >>= compose( R(), std::forward<F>(f) );
    }

    template< class F, class A, class B >
    constexpr auto operator () ( F&& f, A&& a, B&& b )
        -> decltype( std::declval<A>() >>= compose (
                rcloset( LiftM(), std::declval<B>() ),
                closet(closet,std::declval<F>())
            ) )
    {
        return std::forward<A>(a) >>= compose (
            rcloset( LiftM(), std::forward<B>(b) ),
            closet(closet,std::forward<F>(f))
        );
    }
} liftM{};

It acts basically as a n-ary mbind, though one could also define an n-ary mbind! liftM works even if you don't.

Finally, we have all the pieces in place to implement KleisliSplit.

/* kleisliSplit : Kleisli(A -> M<B>) x Kleisli(X -> M<Y>) -> (piar<A,X> -> M<pair<B,Y>>) */ 
template< template<class...> class M, class F, class G >
struct KleisliSplit {
    F f;
    G g;

    constexpr KleisliSplit( F f, G g ) : f(std::move(f)), g(std::move(g)) { }

    template< class X, class Y >
    constexpr auto operator () ( const std::pair<X,Y>& p )
        -> decltype( liftM(pair,f(std::get<0>(p)),g(std::get<1>(p))) )
    {
        return liftM (
            pair, 
            f( std::get<0>(p) ), 
            g( std::get<1>(p) )
        );
    }
};

The final note before moving on: arr. Kleisli's arr is like a Monad's mreturn.

arr : (A -> B) -> Kleisli(A -> M)
arr(f)(x) = mreturn<M>( f(x) )

or:

arr = liftM

template< template<class...> class M, class F >
struct Arrow< Kleisli<M,F> > {

    template< class G >
    using K = Kleisli<M,G>;

    template< class G >
    static constexpr auto arr( G g ) -> Kleisli< M, Part<LiftM,G> > {
        return kleisli<M>( closet(liftM,std::move(g)) );
    }

    template< class G >
    static constexpr auto first( G g ) 
        -> K<decltype( ::split(std::move(g),arr(id)) )> 
    {
     // id is not a Kleisli. 
     // The call to arr refers to the arr above, not ::arr.
     // arr(id) : Kleisli(X -> M<X>)
        return ::split( std::move(g), arr(id) );
    }

    template< class G >
    static constexpr auto second( G g) 
        -> K<decltype( ::split(arr(id),std::move(g)) )>
    {
        return ::split( arr(id), std::move(g) );
    }

    template< class G >
    static constexpr auto split( Kleisli<M,F> f, Kleisli<M,G> g )
        -> K< KleisliSplit<M,F,G> >
    {
        return KleisliSplit<M,F,G>( std::move(f.f), std::move(g.f) );
    }

    template< class _F, class _G >
    static constexpr auto fan( _F&& f, _G&& g )
        -> decltype( kcompose(split(std::declval<_F>(),std::declval<_G>()),arr(duplicate)) )
    {
        return kcompose( split(std::forward<_F>(f),std::forward<_G>(g)), arr(duplicate) );
    }
};
 
int main() {
    auto stars = kleisli<std::vector> (
        [] (char c) { 
            return c == '*' ? std::vector<char>{'+','+'} :
                   c == '+' ? std::vector<char>{'*',' ','*'} : std::vector<char>{c};
        }
    );
    
    auto hairs = kleisli<std::vector> (
        [] (char c) -> std::vector<char> { 
            return c == '*' ? std::vector<char>{'\'','"','\''} :
                   c == '+' ? std::vector<char>{'"',' ','"'} :
                   c == '"' ? std::vector<char>{'\''} :
                   c == '\'' ? std::vector<char>{} :
                   std::vector<char>{c};
        }
    );

    cout << "hairs of '*' : " << hairs('*') << endl;
    cout << "hairs^2 of '*' : " << compose(hairs,hairs)('*') << endl;
    
    cout << "split(stars,hairs) (*,*) = " << split(stars,hairs)(pair('*','*')) << endl;
    cout << "fan(stars,hairs)    (*)  = " << fan(stars,hairs)('*') << endl;
    cout << "fan(hairs,stars)    (*)  = " << fan(hairs,stars)('*') << endl;
    cout << "split(hairs,stars) . fan(stars,hairs) = " 
      << compose( split(hairs,stars), fan(stars,hairs) )('*') << endl;
}

Finally, this outputs

    hairs of '*' : [',",']
    hairs^2 of '*' : [']
    split(stars,hairs) (*,*) = [(+,'),(+,"),(+,'),(+,'),(+,"),(+,')]
    fan(stars,hairs)    (*) = [(+,'),(+,"),(+,'),(+,'),(+,"),(+,')]
    fan(hairs,stars)    (*) = [(',+),(',+),(",+),(",+),(',+),(',+)]
    split(hairs,stars) . fan(stars,hairs) = [(",'),( ,'),(",'),(","),( ,"),(","),(",'),( ,'),(",'),(",'),( ,'),(",'),(","),( ,"),(","),(",'),( ,'),(",')]

Extras -- Convenience and Pitfalls.

Perhaps the most important parts of Control.Arrow are here, but there are still a few things that can be added. For example, compose is backwards! compose(f,g) means "do g, then do f." Because of this we have to read it backwards. fomp is useful, if only because we read left-to-right, not the other way around.

/* fcomp : (A -> B) x (B -> C) -> (A -> C) */
constexpr struct FComp {
    template< class G, class F, class C = Composition<F,G> >
    constexpr C operator () ( G g, F f ) {
        return C(std::move(f),std::move(g));
    }
} fcomp{};

Another tool is precompose and postcompose. Perhaps one wants to compose something that is not a Kleisli with something that is.

/* prefcomp : (A -> B) x (Arrow X Y) -> Arrow A Y */ 
constexpr struct PreFComp {
    template< class F, class A >
    constexpr auto operator () ( F&& f, A&& a )
        -> decltype( arr<A>(declval<F>()) > declval<A>() )
    {
        return arr<A>(forward<F>(f)) > forward<A>(a);
    }
} prefcomp{};

/* postfcomp : Arrow A B x (X -> Y) -> Arrow A Y */ 
constexpr struct PostFComp {
    template< class A, class F >
    constexpr auto operator () ( A&& a, F&& f )
        -> decltype( declval<A>() > arr<A>(declval<F>()) )
    {
        return forward<A>(a) > arr<A>(forward<F>(f));
    }
} postfcomp{};

Beyond that, there's ArrowZero and ArrowPlus to work with Monoids, ArrowChoice based on Either, ArrowApply and ArrowLoop.

ArrowZero, defining zeroArrow, shows the downside to implementing Kleisli as K<M,F> instead of K<M,A,B>. It is defined like so:

zeroArrow = Kleisli (\_ -> mzero)

The type mzero needs to return is M. Its argument, the _, will be of type A. So we have this problem where zeroArrow(k) (for some Kleisli, k) doesn't know what type to return and can't deduce it!

If we had implemented Kleisli with A and B, this would be no problem, but then it would have been impossible(?) to implement Arrow<Kleisli>::first. In Haskell, all functions carry around information about their domain and codomain, so there is no difference between passing around A and B or just F.

The easiest solution is to diverge from Haskell's definition and require zeroArrow to take an explicit result-type parameter.

Conclusions.

Arrows let us do interesting types of composition that one would not normally likely think of. They fall somewhere between Functors and Monads (this article has a graph). My Monadic parser could have also been written with Arrows.

I realize this has been a very long post. Hopefully all the code examples work, there are no inaccuracies, and it is understandable, but with an article this large, something has probably been fudged. Due to the large amount of material, it may be a good idea to split this article up and go more in depth. Please speak up if something seems off.

Source code: https://gist.github.com/4176121
Haskell reference: Control.Arrow

The Importance of Function Objects

2012-11-26T17:12:00.001-08:00

Function objects allow for freedoms that free functions do not. They are more compatible with higher order functions, composition, and generic programming. They are easier to use; extensible and convenient. They are often even more efficient.

Higher Order Functions.

Write a function that adds two arguments of any given type.

template< class X, class Y >
constexpr auto add( X&& x, Y&& y ) 
    -> decltype( std::declval() + std::declval() )
{
    return std::forward(x) + std::forward(y);
}

Easy, right? Now pass that in to std::accumulate.

int main() {
    std::vector<int> v = { 0, 1, 2, 3, 4, 5 };
    int sum = std::accumulate( v.begin(), v.end(), 0, add );
    std::cout << "sum = " << sum << std::endl;
}

And compile!

fo.cpp:20:68: error: no matching function for call to 'accumulate(std::vector<int>::iterator, std::vector<int>::iterator, int, <unresolved overloaded function type>)'

Huh, GCC can't deduce the type of add. Really, the lexical name, add, refers to a whole set of possible template overloads. It could pick one from the set if we supplied some arguments, but not by the name alone. We can fix this by specifying the type arguments.

int sum = std::accumulate( v.begin(), v.end(), 0, add<int&,int&> );

Why do we specify int& instead of int? If we specify int, then because of the quirks in perfect forwarding, signature is

int add( int&&, int&& )

Since its inputs will be lvalues, this will fail to compile. Instead, it will look like this:

int add( int& &&, int& && )

and the arguments will be forwarded to the addition as int&'s.

At this point, falling back on std::plus<int>() would make the most sense, but what if we don't want to have to specify the type? The proposal n3421 expresses this concern, and would allow for the syntax, std::plus<>(), which would be a function object version of add. Finally, that line would compile, but n3421 isn't standard yet. Why not do this?

struct Add {
    template< class X, class Y >
    constexpr auto operator () ( X&& x, Y&& y ) 
        -> decltype( std::declval<Y>() + std::declval<X>() )
    {
        return std::forward<X>(x) + std::forward<Y>(y);
    }
};

constexpr auto add = Add();
 
...
 
int sum = std::accumulate( v.begin(), v.end(), 0, add );

Here, add would be essentially no different from an instance of std::plus<>, but because it is already instantiated, we can avoid much of the syntactic cruft.

Programming Compositionally.

Let's say we have a two-dimensional array

std::vector<int> v = { 1, 2, 3, 4, 5 };
std::vector<std::vector<int>> vv = { v, v, v };

and we want to find the sum of all its element. We might first write a small function to find the sum of a one-dimensional array.

template< class S >
int sum( const S& s ) {
    return std::accumulate( s.begin(), s.end(), 0, add );
}

With this, we could write a simple for loop to iterate over each element, adding the sums of each vector, but that's the whole point of accumulate! We could use std::transform to build a list of sums and accumulate that, but what's the point? The easiest thing to do is use function composition.

Previously, I have talked about function composition. I defined a type, Composition<F,G>, which passed its first argument to G and the result of that along with any following arguments to F.

std::accumulate expects a function where the first value is the accumulator (the sum) and the second value comes from the sequence (in this case: another std::vector). We want the sum of the vector, and then to add the results together. One could fairly trivially write a function that does this explicitly, but that will have to be done for every similar use-case. Patterns of composition can be abstracted away.

template< class F, class G > struct AccumRight {
    F f = F();
    G g = G();

    constexpr AccumRight() { }


    constexpr AccumRight( F f, G g) 
        : f(f), g(g) { }

    template< class X, class Y >
    constexpr auto operator () ( X&& x, Y&& y ) 
        -> decltype( f(std::declval<X>(), g(std::declval<Y>()) ) )
    {
        return f( std::forward<X>(x), g(std::forward<Y>(y)) );
    }
};

template< class F, class G, class A = AccumRight<F,G> >
constexpr A rAccumulation( F f, G g ) {
    return A( f, g );
}

Finally, we can write that function. How does it look if we use free functions?

sum = std::accumulate( vv.begin(), vv.end(), 0, 
                       rAccumulation(add<int&,int>,sum<std::vector<int>>) );

Notice how, since the right-hand argument of add will be an rvalue (the result of sum), we can just tell it int. Now, how about function objects?

sum = std::accumulate( vv.begin(), vv.end(), 0, 
                       rAccumulation(add,sum) );

Much nicer, no? Imagine more complex examples involving passing higher-order functions to higher-order functions. Without using such techniques, higher order functional programming in C++ would be daunting. Manually deducing types is difficult and error prone--sometime impossible due to implementation details, but the compiler can do it instead, if given the chance. Because of this, techniques in C++ that would otherwise be thought of as impossible become easy.

Finer Control.

std::get<N> is a nice function for working with pairs and tuples, but doesn't work on normal containers. Lets define a function object for that!

struct Selector {
    int i;
    
    Selector( int j=0 ) : i(j) { }
 
    template< class S >
    typename const S::value_type& operator () ( const S& s ) const {
        return *std::next( s.begin(), i );
    }
};
 
Selector first(0), third(2); 

int main() {
    std::vector<int> v = { 1, 2, 3, 4, 5 };
    std::cout << "first = " << first(v) << std::endl;
    std::cout << "third = " << third(v) << std::endl;
    std::cout << "fourth = " << Selector(third(v))(v) << std::endl;
}

This will output

first =1
third = 3
fourth = 4

We typically think of functions as static, unchanging things, but Selector is dynamic. It could be used, for example, as a function returning the current player in a game by keeping its member, i, always up-to-date.

For another example, consider a function object, Printer, that prints its arguments to some file. We might write that like this:

namespace std {

template< class X >
string to_string( const std::vector<X>& v ) {
    string str = "[ ";
    for( const X& x : v ) str += to_string(x) + " ";
    str += "]";
    return str;
}

// Because the standard doesn't define this.
string to_string( const std::string& s ) {
    return s;
}

template< class T >
string to_string( const T& s ) {
    ostringstream ss;
    ss << s;
    return ss.str();
}

} // namespace std

struct Printer {
    std::string destination;
    
    Printer( std::string filename ) 
        : destination( std::move(filename) )
     {
     }
     
     void redirect( std::string filename ) {
         destination = std::move( filename );
     }
     
     template< class X >
     void operator () ( const X& x ) {
         std::ofstream s( destination, std::ios_base::app );
         s << std::to_string(x);
     }
};

int main() {
    std::vector<int> v = { 1, 2, 3, 4, 5 };
    std::vector<std::vector<int>> vv = { v, v, v };
    
    Printer prnt( "file1" );
    prnt( v );
    prnt.redirect( "file2" );
    prnt( vv );
}

Here, the dynamics come from being able to change the destination of our output at runtime. It outputs:

file1: [ 1 2 3 4 5 ]
file2: [ [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] ]

Extensibility.

I have talked before about generalizing associativity, transitivity, and other such things in function objects. Inheritance adds a new dimension to functions. Derivatives can benefit from functionality offered in base classes and base classes can benefit from their derivations via the curiously recurring template pattern.

For a different example, let's have the Printer class time-stamp its output.

#include <ctime>
struct StampedPrinter : Printer {
    StampedPrinter( std::string filename )
        : Printer( std::move(filename) )
    {
    }
    
    template< class X >
    void operator () ( const X& x ) {
        std::time_t t = std::time( nullptr );
        std::tm tm = *std::localtime( &t );
        
        std::string time = std::to_string(tm.tm_min) + ":" + 
                           std::to_string(tm.tm_sec) + " -- ";
                           
        Printer::operator()( time + std::to_string(x) + "\n" );
    }
};

By simply changing the type of prnt in the previous example to StampedPrinter, it will output:

    file1:
6:46 -- [ 1 2 3 4 5 ]

file2:
    6:46 -- [ [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] ]

And if run again,

    file1:
    6:46 -- [ 1 2 3 4 5 ]
    13:13 -- [ 1 2 3 4 5 ]

file2:
    6:46 -- [ [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] ]
    13:13 -- [ [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] [ 1 2 3 4 5 ] ]

In conclusion,

Function objects can do anything that free functions can do, but they cooperate better. They are more cohesive and orthogonal. I hope that these simple demonstrations have been thought provoking.

The code for this post: https://gist.github.com/4151880
See also: An interesting article that talks about a trampoline function that runs a recursive function object for tail-call optimization: http://fendrich.se/blog/2012/11/22/compile-time-loops-in-c-plus-plus-11-with-trampolines-and-exponential-recursion/

Some Observations on Human Behavior

2012-11-21T01:17:00.000-08:00

Are not the ones who praise a drug-free space also the most tempted to bring drugs to it? Are the ones who cry in empathy for the poor not actually rich? Is it not the case that those who despise intolerance are the most intolerant?

Don't those who scream "Revolution!" fear change? Are the people who demonize sex not the ones who crave it deeply, and they that oppose pornography not the same as those who suffer from addiction?

Are the polluters not the biggest deniers of global warming? Are the people opposing gay rights not in the closet? Are those who commit voter fraud not condemning it? Are those who speak of freedom not dreaming of fascism? Don't those with the most to give also complain the most about "hand-outs"?

Are the knowledgeable not the quietest and the ignorant not the loudest or do those who speak know, and those who don't not? Do those who boast not actually suck?

Are not those who cast suspicion the least trustworthy? Are those who push blame not riddled with guilt? Does a thief not secure her own goods?

Does laughter imply absence of sorrow?

Is hypocrisy human nature?

Monadic Parsing in C++

2012-11-19T09:58:00.001-08:00

When researching parsing in Haskell, I always find this pdf: eprints.nottingham.ac.uk/223/1/pearl.pdf

I will be referring back to this paper and encourage my readers to at least skim it as well. It may provide more understanding in how it works.

It describes a simple, yet hard to comprehend functional parser. I have attempted to translate this to C++, but first I wrote a more intuitive, less functional one. Just a simple calculator, somewhat close to what's described in the pdf, and it worked like this:

The first stage took the input string and created a list of token type-lexeme pairs. Given the input "1+2*2", it would spit back {(INT,"1"),(OP,"+"),...}. The next stage uses two mutually recursive functions to represent two parse states: that of sums and subtractions and that of numbers, multiplications and divisions. Since addition has the lowest precedence in this mini-language, it's safe to parenthesize the rest of the expression, meaning that if it parsed "1+", it will expect the next number to be a number, and it might want to add one to it, eagerly, but not if fallowed by a "2*2" since multiplication has higher precedence.

These functions convert the vector of tokens to an expression tree that evaluates the arguments."2*2" would parse to a tree with a multiplier at its root and two numbers as its leaves. So, after the first state (sums) had read "1+", it would eagerly construct an addition object with the left argument being 1, and the right argument being the result of the rest of the parse! So it reads "2*2" and builds a tree that becomes the right-side argument for our "1+" tree.

This solution is theoretically consistent with building a language using flex and bison. The source can be found here (gist). But it isn't functional. That's when I stumbled back onto this research paper and decided to give a real go at it.

Functional Parsing.

The parser described in this paper does not act in the same way as my parser does. Rather than lexing, tokenizing, and building an expression tree, it cuts straight to the point. A Parser<int> will be a function that accepts a string and produces ints. But, the job may not be done; there may be more to parse. For every int it parses, it will also store the suffix of the parse. So, if p is some parser of type Parse<int>, p("1abc") will return a vector holding one match: the value, 1, and the suffix, "abc".

What does this class look like?

/* A Parser is a function taking a string and returning a vector of matches. */
template< class X > struct Parser {
    // The value is the target of the parse. (For example "1" may parse to int(1).
    using value_type    = X;

    // A match consists of a value and the rest of the input to process.
    using parse_state   = std::pair<X,std::string>;

    // A parse results in a list of matches.
    using result_type   = std::vector< std::pair<X,std::string> >;

    // A parser is a function that produces matches.
    using function_type = std::function< result_type( const std::string& ) >;

    function_type f;

    Parser( function_type f ) : f(std::move(f)) { }

    Parser( const Parser<X>& p ) : f(p.f) { }
    Parser() { }

    result_type operator () ( const std::string& s ) const {
        return f( s );
    }
};

template< class X, class F > Parser<X> parser( F f ) {
    using G = typename Parser<X>::function_type;
    return Parser<X>( G(std::move(f)) );
}

I have mentioned in previous articles that one should avoid std::function for efficiency reasons, but it vastly simplifies things here. As you can see, Parser merely wraps itself around std::function. I would encourage the reader to think of it as a type alias--not deserving of being called a new type, but more than a typedef.

This is consistent with how the paper defines this type:

newtype Parser a = Parser (String -> [(a,String)])

If nothing can be parsed, and empty list will be returned. If many things are parsed, a list of each match will be returned.

The Parser Monad.

As a reminder, the basic monadic operations are these:

a >> b = b -- Do a, then b.
a >>= f = b -- For each x from a, construct b with f(x).
mreturn x = a -- Construct a monad from a value.

How does this relate to parsing? A parser is basically just a function, so if p and q are both parsers, p >> q must return a new parser, a new function. The simple explanation (do p, then q) is correct. First, p parses and let's say it returns a match, (x,rest). rest is sent to q for parsing and the x is thrown away. It may sound odd to just throw away a value, but it will become more obvious soon.

If p had failed to parse a value, then q would not have been run.

The bind operation, p >>= f, extracts the parsed value, x, from p and creates a new parser from f(x). mreturn x creates a new parser that returns x as its value. It accepts any string, even an empty one. Ideally, x came from the output of another parser.

p >> q -- Run parser p, then q.
p >>= f -- Construct a parser with p's matches.
mreturn x -- Construct a parser that returns x.

We can define it like so:

template< class X > struct Monad< Parser<X> > {
    using Pair = typename Parser<X>::parse_state;
    using R    = typename Parser<X>::result_type;

    /* 
     * mreturn x = parser(s){ vector( pair(x,s) ) }
     * Return a parsed value. Forwards rest of input to the next parser.
     */
    template< class M >
    static M mreturn( X x ) {
        return parser<typename M::value_type> (
            [x]( const std::string& s ) { 
                return R{ Pair(std::move(x),s) }; 
            }
        );
    }
    
    /* a >> b = b */
    template< class Y, class Z >
    template< class Y, class Z >
    static Parser<Y> mdo( Parser<Z> a, Parser<Y> b ) {
        return a >>= [b]( const Z& z ) { return b; };
    }

    /* Continue parsing from p into f. */
    template< class F, class Y = typename std::result_of<F(X)>::type >
    static Y mbind( F f, Parser<X> p ) 
    {
        using Z = typename Y::value_type;
        return parser<Z> (
                [f,p]( const std::string& s ) {
                    // First, extract p's matches.
                    return p(s) >>= [&]( Pair p ) {
                        // Then construct the new parser from the p's output.
                        // Continue parsing with the remaining input with the new parser.
                        return f( std::move(p.first) )( std::move(p.second) );
                    };
                }
        );
    }
};

Do not worry if this source is difficult to understand. It is more important to understand how it is used (which is perhaps common with monads). Note that mdo is defined such that for every successful parse of a, b is parsed. If a fails to parse anything, b fails, too.

These monadic operations are the core building blocks from which we can build more complex system, but the paper also discusses MonadZero and MonadPlus. They are both type classes, like Monad, but extend it to do a few interesting things. In C++, one can concatenate two string by using simple addition: s1 + s2 = s12. MonadPlus is generalization of this. MonadZero completes this generalization by supplying the additive identity. For example, we know that zero + x = x. Thus, "" + s = s.

In parsing terms, zero would refer to a parser that matches nothing and adding two parsers, p+q, will produce a third parser that accepts either p's or q's. For example, itentifier+number would create a function that parses either identifiers or numbers.

We can define MonadPlus and MonadZero in the same way we would define Monad.

template< class ... > struct MonadZero;
template< class ... > struct MonadPlus;

template< class M, class Mo = MonadZero< Cat<M> > >
auto mzero() -> decltype( Mo::template mzero<M>() )
{
    return Mo::template mzero<M>();
}

template< class A, class B, class Mo = MonadPlus<Cat<A>> >
auto mplus( A&& a, B&& b ) -> decltype( Mo::mplus(std::declval<A>(),std::declval<B>()) )
{
    return Mo::mplus( std::forward<A>(a), std::forward<B>(b) );
}

template< class X, class Y >
auto operator + ( X&& x, Y&& y ) -> decltype( mplus(std::declval<X>(),std::declval<Y>()) )
{
    return mplus( std::forward<X>(x), std::forward<Y>(y) );
}

First, we define these for sequences.

template<> struct MonadZero< sequence_tag > {
    /* An empty sequence. */
    template< class S >
    S mzero() { return S{}; }
};

/* mplus( xs, ys ) = "append xs with ys" */
template<> struct MonadPlus< sequence_tag > {
    template< class A, class B >
    static A mplus( A a, const B& b ) {
        std::copy( b.begin(), b.end(), std::back_inserter(a) );
        return a;
    }
};

And then for Parsers.

/* mzero: a parser that matches nothing, no matter the input. */
template< class X > struct MonadZero< Parser<X> > {
    template< class P >
    static P mzero() { 
        return parser<X> (
            []( const std::string& s ){
                return std::vector<std::pair<X,std::string>>(); 
            }
        );
    }
};

/* mplus( pa, pb ) = "append the results of pa with the results of pb" */
template< class X > struct MonadPlus< Parser<X> > {
    using P = Parser<X>;
    static P mplus( P a, P b ) {
        return parser<X> (
            [=]( const std::string& s ) { return a(s) + b(s); }
        );
    }
};

Since we usually only want the first successful parse, the paper define an operator, +++, that does this.

template< class X >
Parser<X> mplus_first( const Parser<X>& a, const Parser<X>& b ) {
    return parser<X> (
        [=]( const std::string s ) {
            using V = std::vector< std::pair< X, std::string > >;
            V c = (a + b)( s );
            return c.size() ?  V{ c[0] } : V{};
        }
    );
}

This completes the required building blocks. A parser of significant complexity could be made using only the above functions and types. The paper describes building a notably simple parser, so let's do that instead!

Basic Monadic Parsers.

The simplest parser is item, which accepts any char.

std::string tail( const std::string& s ) {
    return std::string( std::next(s.begin()), s.end() );
}

/* 
 * Unconditionally match a char if the string is not empty.
 * Ex: item("abc") = {('a',"bc")}
 */
auto item = parser<char> (
    []( const std::string& s ) {
        using Pair = Parser<char>::parse_state;
        using R = Parser<char>::result_type;
        return s.size() ? R{ Pair( s[0], tail(s) ) } : R();
    }
);

To demonstrate its usage, the paper defines a simple parser that takes a string of length three or more and returns the first and third values.

auto p = item >>= []( char c ) {
    return item >> item >>= [c]( char d ) {
        return mreturn<Parser>( std::make_pair(c,d) );
    };
};

p first runs item to extract c, then runs item again but throws away the value. It runs item a third time to extract d and finally returns as its value (c,d). p("abcd") would return {(('a','c'),"d")}.

The next function creates a parser that is a little more helpful:

/* sat( pred ) = "item, if pred" */
template< class P >
Parser<char> sat( P p ) {
    return item >>= [p]( char c ) { 
        return p(c) ? mreturn<Parser>(c) : mzero<Parser<char>>();
    };
}

Given some function that operates on chars, this extracts an item, but then checks the condition without consuming any additional input. If p(c) returns true, then it returns a parser with the value c, otherwise zero, a failed parse. Using this, we can define a parser to accept only a specific char.

Parser<char> pchar( char c ) {
    return sat( [=](char d){ return c == d; } );
}

And then, a parser that accepts only a specific string.

Parser<std::string> string( const std::string& s ) {
    if( s.size() == 0 )
        return mreturn<Parser>( s );

    Parser<char> p = pchar( s[0] );
    for( auto it=std::next(s.begin()); it != s.end(); it++ )
        p = p >> pchar( *it );

    return p >> mreturn<Parser>(s);
}

Note: There is no name conflict with std::string because the source does not contain "using namespace std;".

This function does something very odd. For every char in s, it chains together char parsers. For example, string("abc") would return a parser equivalent to pchar('a') >> pchar('b') >> pchar('c') >> mreturn<Parser>("abc"). If any of the char parsers fail down the line, mreturn<Parser>(s) will fail. Since we already know the values of the successful parses, their values are thrown away.

Though faithful to the paper, this may be less efficient than desirable. One could implement string in this way, too:

Parser<std::string> string( const std::string& str ) {
    return parser<std::string> (
        [str]( const std::string& s ) {
            using R = typename Parser<std::string>::result_type;
            if( std::equal(str.begin(),str.end(),s.begin()) ) {
                return R {
                    { str, s.substr( str.size() ) }
                };
            } else {
                return R();
            }
        }
    );
}

It can at times be simpler to write out these functions instead of composing them, however, that can be thought of as an optimization.

The next function creates a new parser from a parser, p, that accepts one or zero p's.

template< class X >
Parser<std::vector<X>> some( Parser<X> p ) {
    using V = std::vector<X>;
    using Pair = std::pair<V,std::string>;
    using R = std::vector< Pair >;
    using P = Parser<V>;
    return mplus_first( 
        parser<V> (
            [=]( const std::string& s ) {
                auto matches = p(s);

                return matches.size() == 0 ? R{}
                    : R{ 
                        Pair( 
                            V{std::move(matches[0].first)}, 
                            std::move( matches[0].second )
                        ) 
                    };
            }
        ),
        mreturn<Parser>( V{} )
    );
}

It was unfortunately difficult to translate, but does work. (The gist also contains a version called many, which accepts zero or many p's.) some converts a parser of type X to one that produces a vector or X's. It always succeeds, even if it does not successfully parse anything.

To create a parser that consumes whitespace is now trivial.

template< class X >
using ManyParser = Parser< std::vector<X> >;

ManyParser<char> space = some( sat([](char c){return std::isspace(c);}) );

We require one more function to parse alternating sequences. Like what? A sum, like "1+2+3" is a sort of alternating sequence of numbers and +'s. A product is an alternating sequence of numbers and *'s.

Here's the weird part: what does a parser that accepts a "+" return? What about a parser that accepts a "-"? The value of such a parser, as it turns out, is the binary function that it represents! In the case of the implementation below, this is a function pointer of type int(*)(int,int).

/* 
 * chain(p,op): Parse with p infinitely (until there is no match) folding with op.
 *
 * p and op are both parsers, but op returns a binary function, given some
 * input, and p returns the inputs to that function. For example, if:
 *      input: "4"
 *      p returns: 4
 * No operator is read, no operation is performed. But:
 *      input: "4+4"
 *      p returns: 4
 * op is then parsed with the function, rest:
 *      input: "+4"
 *      op returns: do_add
 *      input: "4"
 *      p returns: 4
 *      rest returns: 8
 * rest applies the operation parsed by op. It alternates between parsing p and
 * op until there are no more matches. 
 */
constexpr struct Chainl1 {
    template< class X, class F >
    static Parser<X> rest( const Parser<X>& p, const Parser<F>& op, const X& a ) {
        // Alternate between op and p until input is consumed or a parse fails.
        auto r = op >>= [=]( const F& f ) {
                return p >>= [&]( const X& b ) {
                    return rest( p, op, f(a,b) );
                };
        };

        // Return the first successful parse, or a if none.
        return mplus_first( r, mreturn<Parser>(a) );
    }

    template< class X, class F >
    Parser<X> operator () ( Parser<X> p, Parser<F> op ) const {
        return p >>= closet( rest<X,F>, std::move(p), std::move(op) );
    }
} chainl1{};

Adding the final touches.

The paper describes a few generally useful functions for constructing parsers based off the above function. space (defined above) consumes whitespace. token consumes any trailing whitespace after parsing p.

constexpr struct Token {
    template< class X >
    Parser<X> operator () ( Parser<X> p ) const {
        return p >>= []( X x ) {
            return space >> mreturn<Parser>(std::move(x));
        };
    }
} token{};

symb converts a string to a token.

auto symb = compose( token, string ); // symb(s) = token( string(s) )

apply consumes any leading whitespace.

constexpr struct Apply {
    template< class X >
    Parser<X> operator () ( Parser<X> p ) const {
        return space >> std::move(p);
    }
} apply{};

The big idea is that we never want to manually write a function that returns a list of successful parses. It's hard! It's much easier to compose such functions from smaller, more comprehensible ones and use those to build reasonable complex, but more simple to reason about, parsers.

The parser itself.

Now, using all of the tools provided, we can create a parser much more trivially than otherwise--though that is perhaps true of any time one has a new set of tools. First, we define a parser that accepts digits.

constexpr bool is_num( char c ) {
    return c >= '0' and c <= '9';
}

/* Parse one digit. */
Parser<int> digit = token( sat(is_num) ) >>= []( char i ) { 
    return mreturn<Parser>(i-'0'); 
};

Now, digit("2") returns 2, but (digit >> digit)("22") returns 2 as well! Why? Because the first run of digit extracts the first 2, but throws that value away and the second run of digit extracts the second 2. To parse a two-digit number, we need something like this:

Parser<int> twoDigit = digit >>= []( int x ) {
    return digit >>= [x]( int y ) {
        return mreturn<Parser>( x*10 + y );
    };
};

It extracts the first then the second digit and returns the original number, converted from a string to an int! To parse arbitrarily long numbers (diverging from the paper's version), we can define a chain operation!

int consDigit( int accum, int digit ) { return accum*10 + digit; }
Parser<int> num = chainl1( digit, mreturn<Parser>(consDigit) );

For every two digits parse, num calls consDigit to fold the values together. As mentioned earlier, chainl1 works by alternating between its two parsers. Since the second argument is a parser which consumes no input, num only accepts digits.

Next, we can define the parsers for binary operations.

// Binary operations of type int(*)(int,int).
int do_add(  int x, int y ) { return x + y; }
int do_sub(  int x, int y ) { return x - y; }
int do_mult( int x, int y ) { return x * y; }
int do_div(  int x, int y ) { return x / y; }

auto addop = mplus (
    pchar('+') >> mreturn<Parser>(do_add),
    pchar('-') >> mreturn<Parser>(do_sub)
);

auto mulop = mplus (
    pchar('*') >> mreturn<Parser>(do_mult),
    pchar('/') >> mreturn<Parser>(do_div)
);

addop parses either a "+" or "-" and returns either do_add or do_sub, respectively. Because the parsers must return functions of the same types, std::plus and std::minus could not be used.

With this, we can define a term as an alternating sequence of numbers and multiplications and divisions; and an expr(ession) as an alternating sequence of terms, +'s and -'s.

/* 
 * Parse terms: series of numbers, multiplications and divisions.
 * Ex: "1*3*2" -> (3,"*2") -> 6
 */
Parser<int> term = chainl1( num, mulop );

/*
 * Parse expressions: series of terms, additions and subtractions.
 * Ex: "1+7*9-1" -> (1."+7*9-1") -> (63,"-1") -> 62
 */
Parser<int> expr = chainl1( term, addop );

And we're done! We have just built a calculator! It can evaluate any expression of additions, subtractions, multiplications, divisions, and it is whitespace agnostic. While implementing Parser itself took a considerable amount of work, using it does not.

int main() {
    using std::cin;
    using std::cout;
    using std::endl;

    cout << "Welcome to the calculator!\n" 
         << "Press ctrl+d or ctrl+c to exit.\n"
         << "Type in an equation and press enter to solve it!\n" << endl;

    std::string input;
    while( cout << "Solve : " and std::getline(std::cin,input) ) {
        auto ans = apply(expr)(input);
        if( ans.size() and ans[0].second.size() == 0 )
            cout << " = " << ans[0].first;
        else
            cout << "No answer.";
        cout << endl;
    }
}

I highly encourage anyone reading this to attempt to compile and modify the source code.

See the gist at github for the source in full: https://gist.github.com/4112114
And for the original parser I wrote: https://gist.github.com/4112114#file_trivial_parser.cpp

A public letter to Microsoft about the CTP.

2012-11-15T12:33:00.000-08:00

Dear Microsoft,

I see that you have decided to release C++11 in the form of a CTP. Congratulations on catching up! For a long time, it seemed like you just didn't give a shit, but thankfully that's not true.

I see you now offer variadic templates. Great! You know when GCC 4.3 came out offering back in 2008, and Clang 2.9 the April of last year, it was really amazing, so I'm glad to see you're offering this too now. It's really something that developers ~~chained~~ faithful to MSVC will finally be compiling code written with GCC years ago!

Initializer lists, too!? Wow! How nostalgic. Like when GCC 4.4 came out in '09. Oh, but you can't use it for std::vector? How silly. Well, I'm sure it's just one of those things you'll get around to in the same way you finally got around to C++11. There are probably some major complexities that I am not aware of for integrating this feature into the standard library.

You also now offer delegated constructors. It took GCC a long time to implement that! It only happened earlier this year. Clang didn't even have it until December 2011. Being only a year behind must feel like a major accomplishment. The whole team who implemented this should be promoted for their unusual efficiency.

My, my, default template argument for functions was a huge oversight for the standardizing committee. It's a good thing they corrected themselves, allowing GCC to include it in their 4.3 release series. It's good to see you have also corrected yourselves by including it in your CTP. Developers of MSVC will finally be able to write

 template< class X, class Y = type-exp >
Y f( X x ) {return Y(...); }

instead of what we used to have to write:

 template< class X >
type-exp(...) f( X x ) { return type-exp(...); }

Remember how it took GCC until 4.5 (April 2010) to get explicit conversion operators in, and Clang until 3.0 in December 2011? Well now MSVC developers too can take advantage of the fact that allowing smart pointers to explicitly convert to bools won't allow them to then convert to ints!

 std::unique_ptr<int> p( new int(6); );
if( p ) { } // Explicit conversion!
p + p; // Not allowed!

Have I forgotten a feature to applaud you for? Oh yes, raw string literals! Good job. Not the first here, either, but hey, it's not like you're competing for market share, right? After all, you already got it. Developers are locked in, aren't they? They have do use MSVC to develop professionally for Windows, right? Isn't that the idea?

Now, it would be absolutely horrible if you released all these features that the rest of the C++ community has been using for years and none of your customers knew any of it. So how wonderful it is that you have channel 9 where you have been publishing videos like Scott Meyer's influential Universal References talk and Herb Sutter's (who works for you) talk on the future of our langauge and Lavavej's core C++ course. Your parade of C++ intellectuals surely adds much to the discussion. I wonder: did they use your CTP to research those talks, GCC, or Clang?

So thank you, Microsoft. Thanks for holding the C++ community back several years. Thanks for offering educational content on these features, now that you finally decided to implement them. Thanks for parading all the C++ intellectuals in our faces to get us excited about these features, now that you offer them. Thank you for showing that you really care about advancements in C++, in your products. Thank you for demonstrating that you care about the advancement of the C++ community at-large, when they're using MSVC. Thank you for dishing out pieces of C++11 like chocolate rations. And thank you for hiring Herb Sutter back in 2002, which obviously made you hurry much faster in releasing C++11 features.

Thank you for being you. Without you, we'd all be programming in C++11! That would be insanity.

Generalized Function Objects Explained (For the C++11 unfluent.)

2012-11-15T11:05:00.001-08:00

It occurred to me that some of those who found my "Generic Function Objects" article were not all that familiar with C++11 syntax, and in particular, template template parameters and variadic templates. Learning these concepts is one thing--just takes some googling, reading, and a little practice. Understanding examples like in my article may be more difficult, so I will explain how it works line-by-line. I will not be explaining these features, or how and why they work as that is too much for one blog post to accomplish, but I will be explaining how they work in context.

Note that if one understood the article up to the definition of ConstructT, then feel free to skip to that section.

Update: I have attempted to improve the original article with some of this one, so it may be a little redundant.

Binary

Binary defines member functions for partially applying the first and last arguments. I wrote about this back in September.

template< class F, class X >
struct Part {
    F f;
    X x;

    template< class _F, class _X >
    constexpr Part( _F&& f, _X&& x )
        : f(forward<_F>(f)), x(forward<_X>(x))
    {
    }

    template< class ... Xs >
    constexpr auto operator () ( Xs&& ...xs )
        -> decltype( f(x,declval<Xs>()...) )
    {
        return f( x, forward<Xs>(xs)... );
    }
};

template< class F, class X > struct RPart {
    F f;
    X x;

    template< class _F, class _X >
    constexpr RPart( _F&& f, _X&& x ) 
        : f( forward<_F>(f) ), x( forward<_X>(x) ) { }

    template< class ...Y >
    constexpr decltype( f(declval<Y>()..., x) )
    operator() ( Y&& ...y ) {
        return f( forward<Y>(y)..., x );
    }
};

template< class F > struct Binary {
    template< class X >
    constexpr Part<F,X> operator () ( X x ) {
        return Part<F,X>( F(), move(x) );
    }

    template< class X >
    constexpr RPart<F,X> with( X x ) {
        return RPart<F,X>( F(), move(x) );
    }
};

So, given some type, F, inherited from Binary<F>, F is defined as taking one argument (partial application), and calling the member function with (reverse-partial application). F must be a type that defines a two-argument operator() overload.

We define a derivation like so:

constexpr struct Add : public Binary<Add> {
    using Binary<Add>::operator();

    template< class X, class Y >
    constexpr auto operator () ( X&& x, Y&& y ) 
        -> decltype( std::declval<X>() + std::declval<Y>() )
    {
        return std::forward<X>(x) + std::forward<Y>(y);
    }
} add{};

Add, is not a function in the C sense, it's a function type. add is an instance of Add and is a function object. Add does not inherit Binary's operator() overload by default, so we explicitly tell it to by saying "using Binary<Add>::operator()". Binary's with requires no work to inherit.

auto inc = add(1); // Calls Binary<Add>::operator(); returns Part<Add,int>.
auto dec = add.with(-1); // Calls Binary<Add>::with; returns RPart<Add,int>.
int two = inc(1); // Calls Part<Add,int>::operator(); returns add(1,1).
int one = dec(two); // returns add(2,-1).
add(1,2) -- Calls Add::operator(); returns int(3).

Chainable.

template< class F > struct Chainable : Binary<F> {
    using Binary<F>::operator();

    template< class X, class Y >
    using R = typename std::result_of< F(X,Y) >::type;

    // Three arguments: unroll.
    template< class X, class Y, class Z >
    constexpr auto operator () ( X&& x, Y&& y, Z&& z )
        -> R< R<X,Y>, Z >
    {
        return F()(
            F()( std::forward<X>(x), std::forward<Y>(y) ),
            std::forward<Z>(z)
        );
    }

    template< class X, class Y, class ...Z >
    using Unroll = typename std::result_of <
        Chainable<F>( typename std::result_of<F(X,Y)>, Z... )
    >::type;

    // Any more? recurse.
    template< class X, class Y, class Z, class H, class ...J >
    constexpr auto operator () ( X&& x, Y&& y, Z&& z, H&& h, J&& ...j )
        -> Unroll<X,Y,Z,H,J...>
    {
        // Notice how (*this) always gets applied at LEAST three arguments.
        return (*this)(
            F()( std::forward<X>(x), std::forward<Y>(y) ),
            std::forward<Z>(z), std::forward<H>(h), std::forward<J>(j)...
        );
    }
};

Chainable works in much the same way as Binary does, except that it extends F to take any arbitrary number of arguments (except for zero, but that'd be pretty useless anyway). Redefining Add to be Chainable is not hard.

constexpr struct Add : public Chainable<Add> {
    using Chainable<Add>::operator();

    template< class X, class Y >
    constexpr auto operator () ( X&& x, Y&& y ) 
        -> decltype( std::declval<X>() + std::declval<Y>() )
    {
        return std::forward<X>(x) + std::forward<Y>(y);
    }
} add{};

Only two lines have changed, however add's behaviour has changed dramatically.

int x = add(1,2,3); // Calls Chainable<Add>::operator(X,Y,Z); returns (1+2)+3.
int y = add(1,2,3,4,5,6); // Calls Chainable<Add>::operator(X,Y,Z,H,J...).
auto inc = add(1); // Still calls Binary<Add>::operator().

The three arg version of Chainable<Add> calls add( add(x,y), z ), while the variadic version calls itself( add(x,y), z, h, j... ). It reduces the number of arguments to process by one, being supplied at least four. It is therefore never the base case and always ends by calling its three-arg version.

ConstructT.

template< template<class...> class T > struct ConstructT {
    template< class ...X, class R = T< typename std::decay<X>::type... > >
    constexpr R operator () ( X&& ...x ) {
        return R( forward<X>(x)... );
    }
};

The first template argument may be confusing to those who have not seen them before. We could write template<class> class T, and that would expect a T that took one type parameter. It may actually take one, zero, or several. template<class...> class T is the generic way to say "we know T takes type parameters, but we don't know how many (and it doesn't yet matter)". It might be std::vector, which can take two; the value type and allocator. It might be std::set, which can take three. (Though, neither vector nor set would work for ConstructT.)

We deduce the return type using the default template parameter, R. std::decay transforms the type such that if we pass in an int&, we get an int back.

 typename std::decay<const int&>::type = int
 typename std::decay<int&&>::type = int
 typename std::decay<int>::type = int

So, if T=std::pair and we pass in an int& and std::string&&, R = std::pair<int,std::string>.

ConstructT perfectly forwards the arguments to T's constructor, but decays the types to ensure that it holds the actual values.

Just the same as with add and Add, we must create an instance of ConstructT to call it.

 constexpr auto make_pair = ConstructT<std::pair>();

This version of make_pair behaves just like std::make_pair. Its type is equivalent to this:

template< struct ConstructPair {
    template< class ...X, class R = std::pair< typename std::decay<X>::type... > >
    constexpr R operator () ( X&& ...x ) {
        return R( forward<X>(x)... );
    }
};

Conclusion.

C++11 is very big and no one feature is incredibly useful, but in conjunction they build up a powerful synergy. If these features had been a part of the language to begin with, there is no question that they would be considered required reading, just as much as std::vector and the <algorithm> library are today. They solve problems that have led to years frustration in developers. Perhaps one of the reasons many hate C++ is because of the lacking of many of these features. Becoming fluent in them makes C++ a much nicer language to speak and communicate with.

The Object-Function Paradox.

2012-11-13T09:48:00.003-08:00

Previously, I discussed using inheritance to simplify the writing of generic functions. However, it occurred to me that I can't find anyone else advocating the same thing. No one has yet told me it's a bad practice, but I can't find any other examples of it on the web. (If anyone knows of one, please relieve me of my ignorance.) It's a very short amount of code. Obvious and intuitive. So what impedes people from thinking of it? For that matter, why do some have difficulties comprehending it?

C++ programmers seem to think of functions as something very distinct from objects and types, so obviously the OOP principles of code reuse don't apply. Indeed, the standard says in the first sentence under "The C++ Object Model" (1.8)

The constructs in a C++ program create, destroy, refer to, access, and manipulate objects. An object is a region of storage. [ Note: A function is not an object, regardless of whether or not it occupies storage in the way that objects do. — end note ]

This is technically correct. Functions cannot be objects, they're more like a lexical representation of a code section. The problem is that objects can look like functions, but if we think of this object as a function, then we contradict ourselves because functions cannot be objects.

The distinction is not so transparent. We can make objects that look, talk, and feel like functions, but they aren't.

So, when working out a problem, if a programmer notices a repetitious pattern, s/he might say "I'll create another function that executes the common code!" However, the repetition moves from the in-lined pattern to the calling convention. Because this programmer believes functions are not objects, the solution of using inheritance is impossible to cogitate. Because of this fallacy, this programmer may find the solution hard to comprehend.

We end up with this train of thought: Objects can be functions, but functions cannot be objects, and therefore objects are not functions.

What does a function do? It executes some arbitrary code, taking in arbitrary parameters and returning some value. Function-objects do just that! So objects can be functions! But functions aren't objects...

Paradoxes stunt innovation by preventing the problem from being worked out in a logical way. One cannot conceive of how to use OOP-features to implement functions if they believe the two are fundamentally different and incompatible.

Take Zeno's famous paradox. At one time, we all believe "all is one". We thought of things as wholes. Zeno seemed to think that odd and pointed out that one can go half way from A to B, and then half way to B from there, but he would never get to B. This is obviously true, but with the assumption that "all is one", we have to think about how, then, can we never reach B since we should eventually be one away? The invention of calculus gave way for mathematicians to properly reason about infinitesimally small ones, and a whole new field of science opened up. Our first physicist was born.

Russle's Paradox led to several attempts at improving set theory to remove the paradox, such as the invention of Type Theory, which is obviously very important to computer science.

In quantum mechanics, Schrodinger's Cat and Bell's Theorem led to a greater understanding of the field. Even today, many first learn of quantum mechanics through the tale of an alive and dead cat.

If I was more knowledgeable of the histories of math and physics, I might know more examples, but I hope my point has been persuasive: Paradoxical thinking holds us back, as individuals, as a community.

I submit to the reader this: Functions are objects. Objects are functions. A function is not a procedure to call, but the pointer to that procedure, and pointers are objects, or it can be an instantiation of a user-defined type (like ConstructChainable). We should think of C-functions as the fundamentally different and incompatible things. They are low-level. Function objects are first-class citizens.

The hope is that by removing this logical inconsistency in our minds, we can explore areas of possibility that we never imagined. We can invent solutions to problems that before would have caused a cognitive dissonance. We can progress in directions we hadn't realized before. We can really discover something really interesting, but only if we keep an open mind.

Generic Function Objects, or Generalizing Associativity, Transitivity, and such abstractions over functions and types.

2012-11-11T07:00:00.002-08:00

So, I posted "Rethinking std::binary_function", which was this epiphany that instead of writing all these variadic operator types, I could actually define a base class and inherit its operator overloads. That lead to another realization: https://gist.github.com/4049946. What I am about to elaborate on below feels like a discovery rather than an invention. It's also only a few days old. I normally try to present something much more polished, but this is exciting enough to myself that I want to share it sooner rather than later.

Starting from the top.

In "Rethinking std::binary_function", I thought that all binary functions might benefit from being chained, but actually, not all can be. We can write add(1,2,3) and think of it as (1+2)+3 = 6, but what about <, or less? less(1,2,3) would be equivalent to (true < 3), which is only incidentally correct. And so chaining might not be good for all binary functions, but we can at least say that partially applying the front and back might be useful.

template< class F, class X >
struct Part {
    F f;
    X x;

    template< class _F, class _X >
    constexpr Part( _F&& f, _X&& x )
        : f(forward<_F>(f)), x(forward<_X>(x))
    {
    }

    template< class ... Xs >
    constexpr auto operator () ( Xs&& ...xs )
        -> decltype( f(x,declval<Xs>()...) )
    {
        return f( x, forward<Xs>(xs)... );
    }
};

template< class F, class X > struct RPart {
    F f;
    X x;

    template< class _F, class _X >
    constexpr RPart( _F&& f, _X&& x ) 
        : f( forward<_F>(f) ), x( forward<_X>(x) ) { }

    template< class ...Y >
    constexpr decltype( f(declval<Y>()..., x) )
    operator() ( Y&& ...y ) {
        return f( forward<Y>(y)..., x );
    }
};

template< class F > struct Binary {
    template< class X >
    constexpr Part<F,X> operator () ( X x ) {
        return Part<F,X>( F(), move(x) );
    }

    template< class X >
    constexpr RPart<F,X> with( X x ) {
        return RPart<F,X>( F(), move(x) );
    }
};

We might define a subtraction type like so:

constexpr struct Sub : public Binary<Sub> {
    using Binary<Sub>::operator();

    template< class X, class Y >
    constexpr auto operator () ( X&& x, Y&& y ) 
        -> decltype( std::declval<X>() - std::declval<Y>() )
    {
        return std::forward<X>(x) - std::forward<Y>(y);
    }
} sub{};

sub is a function object of type Sub and inherits the partial- and reverse partial-application from Binary.

constepxr auto twoMinus = sub(2); // Calls Binary::operator(int); returns Part<Sub,int>.
constexptr int zero = twoMinus(2);

constexpr auto minusTwo = sub.with(2); // Calls Binary::with(int); returns RPart<Sub,int>;
constexpr int two = minusTwo(4);

constexpr int one = sub(2,1); // Calls Sub::operator(int,int).

Associativity and Transitivity.

Addition is an associative operation, by which I mean

x + y + z = (x+y) + z

In particular, this demonstrates left (left-to-right) associativity. We can apply this same principle of associativity to functions!

template< class F > struct Chainable : Binary<F> {
    using Binary<F>::operator();

    template< class X, class Y >
    using R = typename std::result_of< F(X,Y) >::type;

    // Three arguments: unroll.
    template< class X, class Y, class Z >
    constexpr auto operator () ( X&& x, Y&& y, Z&& z )
        -> R< R<X,Y>, Z >
    {
        return F()(
            F()( std::forward<X>(x), std::forward<Y>(y) ),
            std::forward<Z>(z)
        );
    }

    template< class X, class Y, class ...Z >
    using Unroll = typename std::result_of <
        Chainable<F>( typename std::result_of<F(X,Y)>, Z... )
    >::type;

    // Any more? recurse.
    template< class X, class Y, class Z, class H, class ...J >
    constexpr auto operator () ( X&& x, Y&& y, Z&& z, H&& h, J&& ...j )
        -> Unroll<X,Y,Z,H,J...>
    {
        // Notice how (*this) always gets applied at LEAST three arguments.
        return (*this)(
            F()( std::forward<X>(x), std::forward<Y>(y) ),
            std::forward<Z>(z), std::forward<H>(h), std::forward<J>(j)...
        );
    }
};

One might define an addition type like this:

constexpr struct Add : public Chainable<Add> {
    using Chainable<Add>::operator();

    template< class X, class Y >
    constexpr auto operator () ( X&& x, Y&& y ) 
        -> decltype( std::declval<X>() + std::declval<Y>() )
    {
        return std::forward<X>(x) + std::forward<Y>(y);
    }
} add{};

Add inherits Binary<Add>'s ::operator() and ::with, and Chainable<Add>'s overloads as well. The three argument overload in Chainable will return add(add(x,y),z). If given four or more arguments, it will call add on the first two, and call itself on the rest, being at least three. The base case for Chainable will always be the three-argument overload.

consexpr auto plusTwo = add(2); // Calls Binary<Add>::operator(int); returns Part<Add,int>.
constexpr int fifteen = add(1,2,3,4,5) // Calls Chainable<Add>::operator(int,int,int,int...).

Operations like less, as mentioned above, are not associative. They are, however, transitive, by which I mean:

x < y < z = (x<y) and (y<z)

In writing a class for transitivity, we can have it just apply each argument to the one right of it, but we need this and to glue the results together. My Transitive class will require both defining the operation (<) and the function that folds the results together (and).

template< class F, class Fold > struct Transitive : Binary<F> {
    using Binary<F>::operator();

    template< class X, class Y, class Z >
    constexpr auto operator () ( X&& x, Y&& y, Z&& z )
        -> typename std::result_of<F(X,Y)>::type
    {
        return Fold() (
            F()( forward<X>(x), forward<Y>(y) ),
            F()( forward<Y>(y), forward<Z>(z) )
        );
    }

    template< class X, class Y, class Z, class A, class ...B >
    constexpr auto operator () ( X&& x, Y&& y, Z&& z, A&& a, B&& ...b )
        -> typename std::result_of<F(X,Y)>::type
    {
        return Fold() ( F()( forward<X>(x), forward<Y>(y) ),
                        F()( forward<Y>(y), forward<Z>(z),
                             forward<A>(a), forward<B>(b)... ) );
    }
};

We can define less like so:

struct And : Chainable<And> {
    using Chainable<And>::operator();

    template< class X, class Y >
    constexpr auto operator () ( X&& x, Y&& y )
        -> decltype( declval<X>() && declval<Y>() )
    {
        return forward<X>(x) && forward<Y>(y);
    }
};

constexpr struct Less : Transitive<Less,And> {
    using Transitive<Less,And>::operator();

    template< class X, class Y >
    constexpr bool operator() ( X&& x, Y&& y ) {
        return forward<X>(x) < forward<Y>(y);
    }
} less{};

Now, writing less(1,2,3) would be equivalent to writing 1<2 and 2<3. less.with(3) would be a function returning whether x is "less than three".

Here's where things get really interesting. What if I want to use these properties of associativity and transitivity to construct types?

From the top, again.

The biggest problem is that types are not functions. Sure, we can write std::pair<X,Y>(x,y) and you might say "Look! A constructor! It's a function!", but we don't like needing to specify our types, so we prefer std::make_pair(x,y). Even though we have this type with a construction function, it's not as useful as a general function. This is a surprisingly common pattern. We make a type, but we want the result's type to be argument dependent, so we make a function that constructs the type. And we do this for every type.

There's make_pair, make_tuple, make_shared, make_unique (eventually), and many, many others. Instead of writing any of these, let's abstract the pattern away using our OOP principles!

template< template<class...> class T > struct ConstructT {
    template< class ...X, class R = T< typename std::decay<X>::type... > >
    constexpr R operator () ( X&& ...x ) {
        return R( forward<X>(x)... );
    }
};

std::decay is a handy type that removes any reference or const until we get the type itself.

typename std::decay<const int>::type = int
typename std::decay<int&&>::type = int
typename std::decay<int>::type = int;

When we call an instance of ConstructT, we may pass in references, r-values, or const values, but the return, R, will hold values.

Now, we can write

constexpr auto make_pair = ConstructT<std::pair>();
std::pair<int,int> p = make_pair(1,2);

constexpr auto make_tuple = ConstructT<std::tuple>();

Each of what used to be an entire function, a paragraph, is now a line. Now, for associativity!

template< template<class...> class T >
struct ConstructChainable : Chainable<ConstructT<T>> {
    using Self = ConstructChainable<T>;
    using Chainable<ConstructT<T>>::operator();

    template< class X >
    using D = typename std::decay<X>::type;
    template< class X, class Y, class R = T< D<X>, D<Y> > >
    constexpr R operator () ( X&& x, Y&& y ) {
        return R( forward<X>(x), forward<Y>(y) );
    }
};

ConstructChainable inherits from Chainable which inherits from Binary, so now we can rewrite the above:

constexpr auto make_pair = ConstructChainable<std::pair>();
constexpr auto pairWith = make_pair.with(y); // Calls Binary<ConstructT<std::pair>>::operator(int).
constexpr auto pairPair = make_pair(1,2,3); // Returns an std::pair< std::pair<int,int>, int >.

All this without having ever explicitly defining make_pair!

Remember the definition of Part from above?

constexpr auto closet = ConstructChainable<Part>();

This one line is very powerful. Let me demonstrate by showing all the code it replaces!

// An extra, variadic definition of the class.
template< class F, class X1, class ...Xs >
struct Part< F, X1, Xs... > 
    : public Part< Part<F,X1>, Xs... >
{
    template< class _F, class _X1, class ..._Xs >
    constexpr Part( _F&& f, _X1&& x1, _Xs&& ...xs )
        : Part< Part<F,X1>, Xs... > (
            Part<F,X1>( forward<_F>(f), forward<_X1>(x1) ),
            forward<_Xs>(xs)...
        )
    {
    }
};

template< class F, class ...X >
constexpr Part<F,X...> closet( F f, X ...x ) {
    return Part<F,X...>( move(f), move(x)... );
}

So what does ConstructChainable do? It makes mole hills out of mountains! What's more; it does so optimally. It perfect forwards the arguments all the way to the constructor, whereas I often would write moving functions in order to simplify type deduction.

So, one might have noticed that closet creates a Part, but it's also derived from Binary, so we can write things like

constexpr auto cc = closet(closet);

and all sorts of unexpected things!

Conclusion.

I guess one way to describe this phenomenon is perhaps as an update to the factory pattern. I'm tempted to call it a type constructor, but I'm not sure that's technically correct.

This post has been more off-the-cuff than I normally try to be, and I haven't prepared any test source to ease in the process of learning. Still, I hope you find this as fascinating as I do.

Gist: https://gist.github.com/4055136

I use this trick extensively in my library, Pure. (Functional.h)

Highlights from the current ISO proposals.

2012-11-10T14:21:00.001-08:00

I thought I'd take some time to highlight some of the interesting proposals I read today after having stumbled on a link to them:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/#mailing2012-11

I've been waiting over four years for Concepts to become standard. It started to feel like Stroustrup had given up on the idea, but luckily N3351 (pdf) by him and Sutton shows there has been progress. It's not honestly the most fascinating read, but when concepts are finally approved, they will improve on tag dispatch. Since a lot of my code currently uses this technique, I look forward to being able to replace it.

Really, really looking forward to it. We are forced to invent such witty ways of expressing generic functions with enable_ifs and partial template specialization, delayed instantiation, but we could just define some abstract principles about types and use those, abondoning all of it. (I do want to point out that ConceptGCC and ConceptClang do exist.)

N3418 (pdf) talks about polymorphic lambdas and N3386 talks about using auto as the return type for regular functions. The latter contains this line, which I think says it all:

"template <class T> auto g(T t) { return t; } // return type is deduced at instantiation time"

N3328 talks about "Resumable Functions" (continuations) and introduces a few new keywords to facilitate the tricky syntax.

The Boost::option library is attempting to get in via N3406. The idea of Boost's option is very like that of Maybe in Haskell.

N3449 proposes a "Open and Efficient Type-Switch". Since this is a really foreign concept to C++, it'll be interesting to see what comes of it.

N3404 and N3505, both by a Mike Spertus, show some impressive "tidbits" with tuples and templates. The latter especially spoke to me; he shows an example of a templateted type, Foo, holding a function, F, and the example includes this:

Foo<???>( []{ ... } )

What's the type of F? Lambdas have implementation-defined types, so we just don't know. He argues that the compiler can deduce the type here. We should be able to write Foo([]{...})!

I'd like to extend this idea. If I wrote a function g(f,x)=f(x,x), then I'd like g(Foo,x) to equal Foo(x,x) and g(std::pair,x)=std::pair(x,x). As it is, I end up writing a function that constructs a Foo and call it foo.

Unpacking tuples into functions (i.e. apply(f,tuple(1,2,3))=f(1,2,3)) has had some interesting discussion over the past few months. Solutions currently include constructing a variadic type of indexes and std::get. N3326 asks why not do something much more general? I really don't want to do it injustice by trying to explain. It's a great read!

There is plenty more I'm not mentioning: parallel programming, filesystem library, reflection, URI's, std::range, and modules. The proposals of today seem very different than the proposals of when I started programming. C++ feels like a language with a much stronger foundation, which lends itself to expansion, inspiration, and innovation. What will this round of standardization bring?

Multi-paradigm

Nippon Ichi And the Love of Grind

Incredible Power Levels

Grind Is Good

Depth: It’s Prinnies All The Way Down.

More and More Content

Conclusions

How Vampire Survivors Made Me Rethink The Concept of the "Core Gameplay Loop"

Game Design as a Dialogue

OpenGL tutorials can do better than raw pointer math.

Just use a struct

Generalize

Conclusion

The Entity Component System

Ideas I didn't like

If only life was so simple...

Conclusions

SFINAE std::result_of? Yeah, right!

Common algorithm patterns.

Projection

The building blocks.

Conclusions

Links

Notes:

Rvalue references and function overloading.

Non-template rvalue overloading.

Rvalues in template functions:

Conclusions

The Python API and C++

Writing a Python Module: The Basics

A Type-Safe PyArg_ParseTuple().

Extending Python Types

Conclusions

Clang 3.4 and C++14

Variable templates.

Generic lambdas and generalized capture.

Auto function return types.

More generalized constexprs.

std::integer_sequence for working with tuples.

experimental::optional.

Misc. improvements.

Conclusions.

Clang and Generic (Polymorphic) Lambdas.

Being terse.

Overloading

Recursion.

Conclusions.

Quick and Easy -- Manipulating C++ Containers Functionally.

Filtering, Taking, and Dropping: Collecting data.

Folding: Reducing a list from many to one. (std::accumulate)

Zip and Map: many to many. (std::transform)

Conclusions.

Zipping and Mapping tuples.

Zipping.

Mapping.

Tuples as function environments.

Conclusions.

Fun with tuples.

Indexing.

Applying a tuple to a function.

Tuples and functions.

Conclusions.

GCC 4.8 Has Automatic Return Type Deduction.

Looking forward.

std::move and lambda? It's just partial application!

Arrows and Kleisli in C++

From A to B.

Functions as Arrows.

Enter Kleisli.

Kleisli<F>.

Extras -- Convenience and Pitfalls.

Conclusions.

The Importance of Function Objects

Higher Order Functions.

Programming Compositionally.

Finer Control.

Extensibility.

In conclusion,

Some Observations on Human Behavior

Monadic Parsing in C++