**Update:**Added examples for

*dup*and

*foldMap*.

Probably the most useful parts of the standard C++ library would be container and algorithms support. Who has worked in C++ for any non-trivial amount of time without using

*std::vector*,

*list*,

*map*, or any of the others?

*<algorithm>*, on the other hand, is more something everyone

*should*know. It solves many of the problems that C++ developers encounter on a daily basis.

*"How do I test if there exists an element, x, where p(x) is true?" : std::any_of*

*"How do I copy each element, x, where p(x)?" : std::copy_if*

*"How do I removed each element, x, where p(x)?" : std::remove_if*

*"How do I move elements from one container to another?" : std::move, <algorithm> version.*

*"How do I find a subsequence?" : std::search*

*"How do I sort an array?" std::sort*

*"How do I find the sum of an array?" : std::accumulate*

Any programmer worth half their salt could write any of these functions in their sleep--they're basic--and the thing is that these algorithms

**do**get written, over and over and over again. Either because one does not realize a specific

*<algorithm>*function exists, or because one is thinking on a low level and unable to see the higher level abstractions.

What I like most about the STL is that the only requirements for adapting any data type to a sequence are

**(1)**define an iterator, and

**(2)**define

*begin()*and

*end()*. After that, all (if not most) of

*<algorithm>*becomes instantly usable with that type. (As well as the range-based for loop.) This makes it incredibly generic and useful.

What I dislike is its verbosity. For example:

*std::transform( xs.begin(), xs.end(), xs.begin(), f );*

Wouldn't this be more clear if written...

*xs = std::transform(xs,f);*

And this allows us to compose functions.

*std::transform( xs.begin(), xs.end(), xs.begin(), f );*

*std::transform( xs.begin(), xs.end(), xs.begin(), g );*

*// vs*

*xs = std::transform( std::transform(xs,f), g );*

*// Or, using actual composition:*

*xs = std::transform( xs,*

*compose(g,f)*);That's what this article will be about. An abstraction over the STL that lends itself to writing more terse, concise code without losing any clarity. This abstraction is less general, by design, because it works on entire containers, not iterators. I am

**not**writing about a replacement for any

*<algorithm>*functions, but an alternative inspired by functional programming.

However, I do go over many

*<algorithm>*functions, so this can also be thought of as a review.

#### Filtering, Taking, and Dropping: Collecting data.

I've always found the erase-remove idiom an unintuitive solution to such a common problem. I certainly would not have figured it out on my own without the help of the C++ community to point it out. Requiring containers to define a predicated*erase*wouldn't be generic, and

*<algorithm>*knows only of iterators, not containers, so the standard library can't offer anything simpler.

*filter*fills this gap by combining its knowledge of containers and iterators.

template< class P, class S > S filter( const P& p, S s ) { using F = std::function< bool(typename S::value_type) >; s.erase ( std::remove_if( std::begin(s), std::end(s), std::not1(F(p)) ), std::end(s) ); return s; } // ... std::vector<int> naturals = {1,2,3,4,5,6,7,8,9/*,...*/}; auto evens = filter( [](int x){ return x%2==0; }, naturals );

*See also: std::not1.*

It does two things: First, it inverses the predicate meaning we can use positive logic (defining what we want to keep, rather than throw away) and second, it abstracts the erase-remove idiom.

Using

*filter*, we can write a rather quick-and-dirty qsort.

// For each x in s, returns pair( p(x), not p(x) ). template< class P, class S > std::pair<S,S> partition( const P& p, S s ) { using F = std::function< bool(typename S::value_type) >; // There does exist std::partition_copy, // however this function demonstrates a use of filter. return std::make_pair ( filter( p, s ), filter( std::not1(F(p)), s ) ); } // Fake Quick-Sort: A non-in-place, qsort-inspired function. template< class S > S fake_qsort( S s ) { using X = typename S::value_type; if( s.size() < 2 ) return s; X pivot = s.back(); s.pop_back(); S low, high; std::tie(low,high) = partition ( [&]( const X& x ) { return x <= pivot; }, std::move(s) ); low = fake_qsort( std::move(low) ); low.push_back( pivot ); // Append the sorted high to the sorted low. high = fake_qsort( std::move(high) ); std::move( std::begin(high), std::end(high), std::back_inserter(low) ); return low; }

*See also: std::partition, std::partition_copy, and std::sort.*

*take*is a function that may seem entirely trivial, at least at first.

template< class S, class _X = decltype( *std::begin(std::declval<S>()) ), class X = typename std::decay<_X>::type > std::vector<X> take( size_t n, const S& s ) { std::vector<X> r; std::copy_n( std::begin(s), n, std::back_inserter(r) ); return r; } template< class P, class S, class _X = decltype( *std::begin(std::declval<S>()) ), class X = typename std::decay<_X>::type > std::vector<X> takeWhile( const P& p, const S& s ) { std::vector<X> r; std::copy( std::begin(s), std::find_if( std::begin(s), std::end(s), p ), std::back_inserter(r) ); return r; }

It also breaks the convention of returning

*s*'s type. There's a reason for that. Infinite lists. Consider this Haskell code:

*take 10 [1..] == [1,2,3,4,5,6,7,8,9,10]*

*[1...]*is an infinite list, starting at one. Obviously, it doesn't actually exist in memory.

*take*returns a finite list that does.

The concept of iterators that represent infinite ranges in C++ isn't new, but neither is it common.

*std::insert_iterator*could insert a theoretically infinite number of elements into a container.

*std::istream_*and

*ostream_iterator*may read from or write to a file infinitely.

We can create pseudo-containers to represent infinite ranges and plug them into

*take*.

template< class X > struct Reader { using iterator = std::istream_iterator<X>; iterator b; iterator e; Reader( iterator b = iterator( std::cin ), iterator e = iterator() ) : b(b), e(e) { } iterator begin() const { return b; } iterator end() const { return e; } }; // Probably ill-advised, // but this is /one/ way of doing IO before main(). std::vector<int> three = take( 3, Reader<int>() );

Sometimes we want to take the contents of an entire container, so

*dup*may be helpful.

std::vector<X> dup( const S& s ) { std::vector<X> r; std::copy( std::begin(s), std::end(s), std::back_inserter(r) ); return r; } std::ifstream in( "in.txt" ); // Reader's constructor implicitly converts in to an iterator. // Some may consider this bad style and require the constructor be "explicit". std::vector<int> contents = dup( Reader<int>(in) );

The counterpart to

*take*is

*drop*, but it does not have

*take*'s quirks.

template< class S > S drop( size_t n, const S& s ) { return S ( std::next( std::begin(s), n ), std::end(s) ); } // Predicate version template< class P, class S > S dropWhile( const P& p, const S& s ) { return S ( std::find_if_not( std::begin(s), std::end(s), p ), std::end(s) ); } Reader<int> r = drop( 2, Reader<int>() );

*drop*makes no promises about infinite lists, but unlike most container- or range-based algorithms, it can work

*on*them. In the above example, two integers are read from

*std::cin*, and their values lost.

For another example of the use of pseudo-containers, consider this solution the the first Euler Project problem using boost::irange.

#include <boost/range/irange.hpp> void euler1() { // multiples of... auto three = boost::irange( 3, 1001, 3 ); auto five = boost::irange( 5, 1001, 5 ); // Ensure that the final sum has no duplicates. std::vector<int> all; std::set_union( std::begin(three), std::end(three), std::begin(five), std::end(five), std::back_inserter(all) ); std::cout << "The sum of every multiple of 3 or 5 bellow 1000 :" << std::accumulate( std::begin(all), std::end(all), 0 ) << std::endl; }

####
Folding: Reducing a list from many to one. (*std::accumulate*)

Accumulating is the "imperative" description of folding. Historically, you'd call the variable you update with the results of each calculation the accumulator. To accumulate, then, is to iterate through a sequence, updating the accumulator with each iteration.Folding is another way to think of it. A fold is a transformation from a list of values to just one value. Haskell defines

*foldl*and

*foldr*, meaning "fold left" and "right".

template< class F, class X, class S > constexpr X foldl( F&& f, X x, const S& s ) { return std::accumulate ( std::begin(s), std::end(s), std::move(x), std::forward<F>(f) ); } int main() { std::vector<int> v = { 5, 3, 2 }; std::cout << "((10 - 5) - 3) - 2) = " << foldl( std::minus<int>(), 10, v ) << std::endl; }

*foldl*is really just another name for

*accumulate*.

*The accumulation function (here,*

*std::minus*) expects the accumulator as the left argument and value to accumulate as its right.

*foldr*is reversed: Not only does it iterate in reverse, but expects the accumulator in the right-hand argument.

// A function wrapper that flips the argument order. template< class F > struct Flip { F f = F(); constexpr Flip( F f ) : f(std::move(f)) { } template< class X, class Y > constexpr auto operator () ( X&& x, Y&& y ) -> typename std::result_of< F(Y,X) >::type { return f( std::forward<Y>(y), std::forward<X>(x) ); } }; template< class F, class X, class S > constexpr X foldr( F&& f, X x, const S& s ) { using It = decltype(std::begin(s)); using RIt = std::reverse_iterator<It>; return std::accumulate ( // Just foldl in reverse. RIt(std::end(s)), RIt(std::begin(s)), std::move(x), Flip<F>(std::forward<F>(f)) ); } int main() { std::vector<int> v = { 5, 3, 2 }; std::cout << "(2 - (3 - (5-10))) = " << foldr( std::minus<int>(), 10, v ) << std::endl; }

Folding is great for monoids; types that have a binary operation with, often, the the signature "

*X(const X&, const X&)*".

std::vector<std::string> strs = { "st", "ri", "ng" }; // std::string associated with (+) is a monoid. std::cout << "'st' + 'ri' + 'ng' = " << foldl( std::plus<std::string>(), std::string(), strs ) << std::endl; using Func = std::function< int(int) >; auto comp = []( Func f, Func g ) { return [f,g]( int x ){ return f(g(x)); }; }; auto inc = []( int x ) { return x+1; }; auto id = []( int x ) { return x; }; std::vector<Func> incs = { inc, inc, inc }; // Functions can be monoids under composition. std::cout << "(inc . inc . inc)(1) = " << foldl( comp, Func(id), incs )(1) << std::endl;

Functional programmers also like to build lists using

*fold*. They build lists starting at the tail, so they typically prefer

*foldr*to

*foldl*.

*std::forward_list*works like

*[]*in Haskell and linked lists in other functional languages. This snippet simply copies the values from the

*std::vector*,

*v*.

using List = std::forward_list<int>; auto cons = []( List l, int x ) { l.push_front( x ); return std::move(l); }; List l = foldr( cons, List(), v );

Note: This one is not an example of a monoid.

####
Zip and Map: many to many. (*std::transform*)

To zip two sequences together by some function is the same as calling *std::transform*. Transform implies modifying each member by some function. Zip implies the same, but with the visual metaphor of combining two lists into one, starting at one end and working up.

template< class F, template<class...>class S, class X, class Y, class Res = typename std::result_of< F(X,Y) >::type > S<Res> zip( F&& f, const S<X>& v, const S<Y>& w ) { S<Res> r; std::transform( std::begin(v), std::end(v), std::begin(w), std::back_inserter(r), std::forward<F>(f) ); return r; } int main() { std::vector<int> v = { 5, 3, 2 }; auto doubleV = zip( std::plus<int>(), v, v ); }

Note: The only way I have discovered to write

*zip*variadically is with tuples. Since this article is not on tuples, refer to the definition of

*transform*in "Zipping and Mapping Tuples".

Note2: An in-place version of this function is possible, but showing both general and optimized versions of each function would be redundant, and the topic of optimization is worth discussing on its own.

Mapping is similar to zipping--in fact the two-argument forms of

*zip(f,xs)*and

*map(f,xs)*should be equivalent. The three argument form, like

*map(f,xs,ys)*, applies

*f*to every combination of

*x*and

*y*.

*map(f,{x,y},{a,b}) == { f(x,a), f(x,b), f(y,a), f(y,b) }*

If

*xs*is size

*N*and

*ys*is of size

*M*, then

*map(f,xs,ys)*returns a sequence of size

*N x M*.

template< class F, template<class...>class S, class X, class Res = typename std::result_of< F(X) >::type > S<Res> map( const F& f, const S<X>& s ) { S<Res> r; std::transform( std::begin(s), std::end(s), std::back_inserter(r), std::forward<F>(f) ); return r; } template< class F, template<class...>class S, class X, class Y, class Res = typename std::result_of< F(X,Y) >::type > S<Res> map( const F& f, const S<X>& xs, const S<Y>& ys ) { S<Res> r; for( const X& x : xs ) for( const Y& y : ys ) r.emplace_back( f(x,y) ); return r; } int main() { std::vector<int> v = { 5, 3, 2 }; std::vector<int> w = { 9, 8, 7 }; auto sums = map( std::plus<int>(), v, w ); }

*map*is a bread and butter function in functional programming.

*// Convert a sequence from one type to another:*

*auto ints = map( toInt, floats );*

*// In a game loop:*

*actors = map( update, actors );*

*// A deck of cards (four suites with twelve values).*

*auto deck = map( make_card, suites, value );*

*// Making variants of the same thing from simpler data.*

*auto inits = { 1, 2, 3, 4 };*

*auto cs = map (*

*[]( int i ) { return std::complex<float>(i,0.1); },*

*inits*

*);*

*// Checking for collisions:*

*ColisionObject collisions = map( make_collision, actors, actors );*

*// AI:*

*states = map( successor, actions, states );*

One downfall of

*map*is that it may create redundancies, which makes

*filter*useful in conjunction.

*states = filter (*

*state_is_valid,*

*map( successor, actions, states )*

*);*

While this may turn an algorithm from one-pass (update and add if valid) to two-pass (update all states, then filter), it also makes simpler algorithms that can be optimized more easily by the compiler at times. For example,

*for( auto x : xs ) {*

*for( auto y : ys ) {*

*z = x * y;*

*if( pred(z) ) r.push_back(z);*

*}*

*}*

*// or:*

*auto r = filter( pred, map(std::multiplies<int>(),xs,ys) );*

While only profiling can tell in any given instance, the second example may be faster under some circumstances. The compiler may be able to vectorize the call to

*map*, but have difficulties applying the same optimization to the first because it cannot evaluate both the multiplication and predicate in one vectorized step.

Sometimes, the goal is to calculate something given the data, rather than

*map*it. Naively, one might write something like

*auto r = fold( f, map(g,xs) );*

But isn't creating the new container inefficient? What if an in-place version of

*map*were implemented, wouldn't transforming

*xs*before folding still be inefficient? Thus,

*foldMap*is useful.

template< class Fold, class Map, class X, class S > X foldMap( const Fold& f, const Map& m, X x, const S& s ) { for( auto& y : s ) x = f( std::move(x), m(y) ); return x; } #include <cctype> int main() { const char* names[] = { "jonh", "mary", "cary" }; auto appendNames = []( std::string x, std::string y ) { return x + " " + y; }; auto capitolizeName = []( std::string name ) { name[0] = std::toupper( name[0] ); return name; }; std::cout << "Names : " << foldMap ( appendNames, capitolizeName, std::string(), names ) << std::endl; }

#### Conclusions.

Haskell's Data.List is actually a lot like*<algorithm>*, though on a higher level of abstraction. There are some things that can only be done with iterators, but many that can also only be done with whole containers. Data.List gives some good inspiration for helpful algorithms, even in C++.

But unlike in C++, Haskell uses simple linked lists by default and all of Data.List's function work only on linked lists. This gives both Haskell and functional programming a bad name when people compare Haskell code using linked lists to C++ code using

*std::vector*. (See "C++ Benchmark -- std::vector vs. std::list vs. std::deque") When libraries are written to optimize inefficiencies in the linked list, like Data.Text, they re-implement Data.List's interface and often achieve equivalent efficiency to well-optimized C, but not without plenty of redundancy.

In C++, we can write one static interface that is both generic and efficient. Writing functional code does not mean writing slow code. The mathematical nature of these operations can even help the compiler optimize. The high-level interface of Data.List fits snugly atop of the low-level interface of iterators.

**Source for this article:**https://gist.github.com/4290166

C++ lib with lazy fold/map/filters and without verbosity: http://volnitsky.com/project/sto/

ReplyDeleteHmmm... I normally don't mention this, but https://github.com/splinterofchaos/Pure/blob/master/List.h

DeleteAlthough, our libraries do different things and have different goals. I just felt compelled to point out that, nor

mylibrary, is the only option. There are a couple of them out there.Very cool article :-) I was trying to do something similar a while ago but had to shelf it due to time constraints.

ReplyDeleteOne thing I would like to mention though is you state your approach is more mathematical than the STL's iterators (which may be true), but do you know about Alex Stepanov's work on foundations of programming? He rigorously defines the mathematical meaning of "iterator" and builds the STL from first principles and proves properties (such as termination) of the algorithms, plus lots more. It may not be functional programming (in that sense I like yours better, fp for the win :-)) but it's solid mathematics that I'm sure you'll appreciate as much as I do :-)

Have a good one :-)

Thanks for pointing me in Stepanov's direction! It seems he was an extremely early adopter of generic programming. [1]

DeleteI found an interview [2] where he mentioned his inspiration for the STL coming from the idea "I suddenly realized that the ability to add numbers in parallel depends on the fact that addition is associative. ... a parallel reduction algorithm is associated with a semigroup structure type."

Mathematical basis indeed; not just in the terminology, but--and this is what I think most don't get--in recognizing the correlations between algorithms, data, and mathematical structures that proved certain algorithms are correct before they were conceived.

I definitely need to edit in a better comparison than "more mathematical" because the STL obviously

is. Thanks again.BTW: Have you read "Elements of Programming"? It looks pretty interesting...

[1]: http://en.wikipedia.org/wiki/Alexander_Stepanov

[2]: http://www.stlport.org/resources/StepanovUSA.html

Yeah I have that book - read it cover to cover a few times. It's an excellent read. I personally think it should be required reading for any serious programmers, or even mathematicians who are interested in computation. When (if) I start my PhD, I would love to teach a class using EoP as the reading material.

DeleteWith RO library (http://volnitsky.com/project/ro/) many of your examples become one-liners:

ReplyDelete// Infinite lazy numeric range

auto evens = range(numeric_limits::max()) | _1%2==0 | _1%2==0;

// Euler1

auto euler1 = range(1,1000) | (_1%3==0 || _1%5==0) || add';

Ops, sorry for spamming :) . I've already wrote here. Please, delete my post.

DeleteHey! I am so excited to get to know if you have a lot of traffic on your blogging resource?

ReplyDeleteIt depends on what you count as a lot. My most popular article, on GCC 4.8 and the

Deleteautokeyword, has been seen by over 7,000 people. Tiny for internet numbers. Dwarfed by the number of C++ programmers. But for a C++ blog? Who knows.Three other articles have received over five-thousand hits. The median is between 1,000 and 1,500.

I have not posted an article for a while, so currently hits fluctuate at less than one-hundred a day, from random sources, to random articles. For this week, I am getting the most hits from google.com (32), fallowed by twitter (14), then a French site: http://cpp.developpez.com/ (10), and no more than five hits from any other source.

I was very pleased to find this site.I wanted to thank you for this great read!! I definitely enjoying every little bit of it and I have you bookmarked to check out new stuff you post.

ReplyDeleteSILA

great resource indeed. Takes time to look this through :-)

ReplyDelete