Monday, September 24, 2012

Range-based for loop on a int range? Enumerate!


Update: The enumerate implementation now allows one to specify a custom step, or stride.  

C++’s new range-based for loop offers a very convenient syntax compared to the old.

   // #include <stdio.h>
   for( int x : {1,2,3,4,5,6,7,8,9,10} )
       printf( “%d “, x );
   puts(“”);

Here, everything in the {...} gets initialized as an std::initializer_list<int>. This is convenient and all, but what if I don’t what to have to write one though ten? Maybe this...

   #include <stdio.h>
   #include <vector>
   #include <algorithm>

   // Create an inclusive range [b,e].
   std::vector<int> enumerate( int b, int e ) {
       // but store it in the range [b,e).
       std::vector<int> v( e-b + 1 );
       std::iota( v.begin(), v.end(), b ); // C++11
       return v;
   }

   int main() {
       for( int x : enumerate(1,10) )
           printf( "%d ", x );
       puts("");
   }

To compile without c++11, just replace std::iota with a while loop.

This works, but in a worrying way. Even with full optimizations, gcc cannot realize that the output would be equivalent to a regular for loop. (i.e. for(i=1;i<=10;i++)). We want a truly zero-overhead abstraction.

One solution is that we can invent a phony container type with a phony iterator type! The container will hold a special iterator type that will do nothing but abstract an int as an iterator type. How does that work?

An iterator has several requirements one might not immediately consider. Many STL algorithms require looking into a class called std::iterator_traits which define many things like what type the iterator will dereference to, the category of the iterator (random access, forward, etc.), and a few other useful things. Our iterator type needs to meet these constrains. In order to do so, we merely inherit from std::iterator, which defines these for us. Here is the implementation in full:

   #include <stdio.h>
   #include <vector>
   #include <algorithm>

   template< class I > struct XRange {
       typedef I value_type;
       typedef I difference_type;
        struct iterator
            : std::iterator<std::random_access_iterator_tag,I,I>
        {
            value_type i;
            value_type stride;

            constexpr iterator( value_type i, value_type stride )
                : i(i), stride(stride) { }
            constexpr iterator( iterator it, value_type stride )
                : i(*it), stride(stride) { }

            constexpr value_type operator* () { return i; }
            iterator operator++ () { ++i; return *this; }
            iterator operator-- () { --i; return *this; }
            iterator operator++ (int) { auto cpy = *this; ++(*this); return cpy; }

            iterator operator-- (int) { auto cpy = *this; --(*this); return cpy; }


            constexpr iterator operator+ ( difference_type n ) {
                return iterator( (i + n) / stride, stride );
            }
            constexpr iterator operator- ( difference_type n ) {
                return iterator( (i - n) / stride, stride );
            }
            constexpr value_type operator- ( iterator other ) {
                return (i - *other) / stride;
            }

            constexpr bool operator== ( iterator other ) { return i == *other; }
            constexpr bool operator!= ( iterator other ) { return i != *other; }

            iterator& operator+= ( difference_type other ) {
                i += other * stride;
                return *this;
            }

            iterator& operator-= ( difference_type other ) {
                i -= other * stride;
                return *this;
            }
        };

        value_type stride;
        iterator b, e;

        constexpr XRange( value_type b, value_type e, value_type stride=1 )
            : stride(stride), b(b,stride), e(e,stride)  { }
        constexpr XRange( iterator b, iterator e, value_type stride=1 )
            : stride(stride), b(b,stride), e(e,stride)  { }

        constexpr iterator begin() { return b; }
        constexpr iterator end()   { return e; }
    };

    typedef XRange<unsigned int> IRange;

    /*
     * enumerate [b,e] = XRange( [b,e+1) )
     * Creates an inclusive range from b to e.
     */
    constexpr IRange enumerate( unsigned int b, unsigned int e,
                                unsigned int stride=1 ) {
        // Adding one to the en
        return IRange( b, e + 1, stride );
    }

   int main() {
       for( unsigned int x : enumerate(1,10) )
           printf( "%d ", x );
       puts("");
   }

Now, how well does gcc optimize this? Here, compiled with “g++ enumerate.cpp -std=c++11 -O3 -S”:

       .file "enumerate.cpp"
       .section .rodata.str1.1,"aMS",@progbits,1
   .LC0:
       .string "%d " // The string we’re printing
   .LC1:
       .string ""
       .section .text.startup,"ax",@progbits
       .p2align 4,,15
       .globl main
       .type main, @function
   main:
   .LFB2657:
       .cfi_startproc
       subq $8, %rsp
       .cfi_def_cfa_offset 16
       movl $1, %edx      // Load 1
       movl $.LC0, %esi // Load “%d “
       movl $1, %edi     
       xorl %eax, %eax
       call __printf_chk // print “1 “
       movl $2, %edx    // Load 2
       movl $.LC0, %esi
       movl $1, %edi
       xorl %eax, %eax
       call __printf_chk // print “2 “
       movl $3, %edx    // Load 3...

GCC has optimized the loop entirely out! It is now equivalent to writing:

   printf( “%d “, 1 );
   printf( “%d “, 2 );
   // ...

We have found a truly zero-overhead abstraction with nicer syntax than in the past, and all with just a small amount of work. We can even use this with STL functions.

   std::vector<int> toVector( const IRange& r ) {
       std::vector<int> v;
       std::copy( r.begin(), r.end(), std::back_inserter(v) );
       return v;
       // Alternatively: return std::vector<int>( r.begin(), r.end() );
   }
   int main() {
       auto v = toVector( enumerate(1,10) );
       for( unsigned int x : v )
           printf( "%d ", x );
       puts("");
   }

We cannot insert into an IRange, take an element out, or any modifying action, but we can easily create a vector or list from one and then modify that. We can save the creation of a vector or initializer_list by using this instead, and not calling std::iota. 

In conclusion, I want to note that the above implementation is under-tested and could possibly be more standard compliant; but I thought the trick was so neat that I’d try and share it anyway. Feedback is welcome.

2 comments:

  1. Good example, thanks for the info about iterator_traits and its requirements.

    I made a simpler version that at least works with the range for loop, I posted it here:
    https://gist.github.com/3786583

    Why did you opt for the default constructor + copy call instead of the iterator range constructor in the toVector function?

    ReplyDelete
    Replies
    1. I wanted to show both versions, just to prove it works!

      I don't think there's anywhere your version won't work, though you should inherit from std::iterator with std::forward_iterator_tag, meaning that you can increment forwards, but not backwards.

      Also, it's good to define value_type and the common typedefs in STL containers. Just keep that in mind if something fails to compile (which is when you know you need it).

      Happy coding!

      Delete

Please feel free to post any comments, concerns, questions, thoughts and ideas. I am perhaps more interested in what you have to say than you are in what I do.