Monday, September 24, 2012

Range-based for loop on a int range? Enumerate!


Update: The enumerate implementation now allows one to specify a custom step, or stride.  

C++’s new range-based for loop offers a very convenient syntax compared to the old.

   // #include <stdio.h>
   for( int x : {1,2,3,4,5,6,7,8,9,10} )
       printf( “%d “, x );
   puts(“”);

Here, everything in the {...} gets initialized as an std::initializer_list<int>. This is convenient and all, but what if I don’t what to have to write one though ten? Maybe this...

   #include <stdio.h>
   #include <vector>
   #include <algorithm>

   // Create an inclusive range [b,e].
   std::vector<int> enumerate( int b, int e ) {
       // but store it in the range [b,e).
       std::vector<int> v( e-b + 1 );
       std::iota( v.begin(), v.end(), b ); // C++11
       return v;
   }

   int main() {
       for( int x : enumerate(1,10) )
           printf( "%d ", x );
       puts("");
   }

To compile without c++11, just replace std::iota with a while loop.

This works, but in a worrying way. Even with full optimizations, gcc cannot realize that the output would be equivalent to a regular for loop. (i.e. for(i=1;i<=10;i++)). We want a truly zero-overhead abstraction.

One solution is that we can invent a phony container type with a phony iterator type! The container will hold a special iterator type that will do nothing but abstract an int as an iterator type. How does that work?

An iterator has several requirements one might not immediately consider. Many STL algorithms require looking into a class called std::iterator_traits which define many things like what type the iterator will dereference to, the category of the iterator (random access, forward, etc.), and a few other useful things. Our iterator type needs to meet these constrains. In order to do so, we merely inherit from std::iterator, which defines these for us. Here is the implementation in full:

   #include <stdio.h>
   #include <vector>
   #include <algorithm>

   template< class I > struct XRange {
       typedef I value_type;
       typedef I difference_type;
        struct iterator
            : std::iterator<std::random_access_iterator_tag,I,I>
        {
            value_type i;
            value_type stride;

            constexpr iterator( value_type i, value_type stride )
                : i(i), stride(stride) { }
            constexpr iterator( iterator it, value_type stride )
                : i(*it), stride(stride) { }

            constexpr value_type operator* () { return i; }
            iterator operator++ () { ++i; return *this; }
            iterator operator-- () { --i; return *this; }
            iterator operator++ (int) { auto cpy = *this; ++(*this); return cpy; }

            iterator operator-- (int) { auto cpy = *this; --(*this); return cpy; }


            constexpr iterator operator+ ( difference_type n ) {
                return iterator( (i + n) / stride, stride );
            }
            constexpr iterator operator- ( difference_type n ) {
                return iterator( (i - n) / stride, stride );
            }
            constexpr value_type operator- ( iterator other ) {
                return (i - *other) / stride;
            }

            constexpr bool operator== ( iterator other ) { return i == *other; }
            constexpr bool operator!= ( iterator other ) { return i != *other; }

            iterator& operator+= ( difference_type other ) {
                i += other * stride;
                return *this;
            }

            iterator& operator-= ( difference_type other ) {
                i -= other * stride;
                return *this;
            }
        };

        value_type stride;
        iterator b, e;

        constexpr XRange( value_type b, value_type e, value_type stride=1 )
            : stride(stride), b(b,stride), e(e,stride)  { }
        constexpr XRange( iterator b, iterator e, value_type stride=1 )
            : stride(stride), b(b,stride), e(e,stride)  { }

        constexpr iterator begin() { return b; }
        constexpr iterator end()   { return e; }
    };

    typedef XRange<unsigned int> IRange;

    /*
     * enumerate [b,e] = XRange( [b,e+1) )
     * Creates an inclusive range from b to e.
     */
    constexpr IRange enumerate( unsigned int b, unsigned int e,
                                unsigned int stride=1 ) {
        // Adding one to the en
        return IRange( b, e + 1, stride );
    }

   int main() {
       for( unsigned int x : enumerate(1,10) )
           printf( "%d ", x );
       puts("");
   }

Now, how well does gcc optimize this? Here, compiled with “g++ enumerate.cpp -std=c++11 -O3 -S”:

       .file "enumerate.cpp"
       .section .rodata.str1.1,"aMS",@progbits,1
   .LC0:
       .string "%d " // The string we’re printing
   .LC1:
       .string ""
       .section .text.startup,"ax",@progbits
       .p2align 4,,15
       .globl main
       .type main, @function
   main:
   .LFB2657:
       .cfi_startproc
       subq $8, %rsp
       .cfi_def_cfa_offset 16
       movl $1, %edx      // Load 1
       movl $.LC0, %esi // Load “%d “
       movl $1, %edi     
       xorl %eax, %eax
       call __printf_chk // print “1 “
       movl $2, %edx    // Load 2
       movl $.LC0, %esi
       movl $1, %edi
       xorl %eax, %eax
       call __printf_chk // print “2 “
       movl $3, %edx    // Load 3...

GCC has optimized the loop entirely out! It is now equivalent to writing:

   printf( “%d “, 1 );
   printf( “%d “, 2 );
   // ...

We have found a truly zero-overhead abstraction with nicer syntax than in the past, and all with just a small amount of work. We can even use this with STL functions.

   std::vector<int> toVector( const IRange& r ) {
       std::vector<int> v;
       std::copy( r.begin(), r.end(), std::back_inserter(v) );
       return v;
       // Alternatively: return std::vector<int>( r.begin(), r.end() );
   }
   int main() {
       auto v = toVector( enumerate(1,10) );
       for( unsigned int x : v )
           printf( "%d ", x );
       puts("");
   }

We cannot insert into an IRange, take an element out, or any modifying action, but we can easily create a vector or list from one and then modify that. We can save the creation of a vector or initializer_list by using this instead, and not calling std::iota. 

In conclusion, I want to note that the above implementation is under-tested and could possibly be more standard compliant; but I thought the trick was so neat that I’d try and share it anyway. Feedback is welcome.