Don’t Be An Iter-hater!

Status: In Progress

This note is based on this talk

Golang 1.23 added a “first class” iterator concept. The addition itself, as well as the implementation, was a bit controversial. The talk and this note attempt to demystify some use cases for iterators and to provide a bit of a framework around how to write a custom iterator in Go, without making any value judgements around having iterators in the language or the specific implementation chosen.

What Are Iterators?

An iterator is just a mechanism for “going through” (iterating over) a sequence of values. That’s really it!

Imagine I have some array, slice, set, sequence or vector of elements. A classic example might be a bunch of files on disk to be processed.

// pseudocode - no Go yet
["Dockerfile", "README.md", "src", "test"]

Now imagine we want to go through these elements to operate on them for some reason. We might transform them to some other value. We might choose to keep some and discard others We might run some code on them which results in side effects. All very common operations in programming.

// still pseudocode
for element in iterate_over(["Dockerfile", "README.md", "src", "test"]) {
    // transform to another value, e.g. absolutise the path
    // filter out items, e.g. remove directories keep only files
    // cause some side effect, e.g. delete the file
}

We’ll call the bit between the curly braces here, which is being repeatedly run, the body of the for loop.

In the for loop we are iterating over a sequence of elements. The body is run on each one. The mechanism we use to get the actual elements to be iterated over in a certain order is frequently called an iterator in various programming languages.

To function, this for loop needs to know two things.

These two things correspond to the basic functions of iterators, which usually provide

Iterators In Other Languages

Before we dive into how iterators work in Go, it’s worth remembering that iterators are very common concept present in most modern programming languages. Since most Gophers have worked with/in programming languages other than Go, it can be interesting/helpful to have these familiar patterns in mind.

Here is how iterators are represented in a few languages that may be familiar to Gophers

| Language   | Iterator Concept        | Next Element    | Finished?              |
| ---------- | ----------------------- | --------------- | ---------------------- |
| Java       | `java.util.Iterator`    | `.next()`       | `hasNext() == false`   |
| Javascript | "iterator protocol"     | `.next().value` | `.next().done`         |
| Python     | `std::iter::Iterator`   | `.next()`       | `.next() == None`      |
| Rust       | `__iter__` / `__next__` | `__next__()`    | Raises `StopIteration` |

This is just a selection of languages that I’m familiar with - of course many other languages have these concepts, and I may add to the above as I get acquianted with them.

The approaches to iterators are pretty uniform across these languages, which are all more-or-less ‘C style’ languages which are imperative at heart. We’ll politely ignore Python making the time-honoured faux pas of using exceptions as flow control.

Naturally, functional style languages also include iterator concepts, such as Elixir’s Enumerable protocol and Clojure’s ISeq. The functional nature of these languages however mean that the actual implementations of these look quite different (and in some cases a lot more complex). In any case, since Go is a ‘C style’ imperative-at-heart language too, it will not benefit us to dig too deeply into these details.

Iterator/For Loop Interchangeability

Before we dive into the details of how to write iterators and when it might make sense to use iterators, it’s always worth remembering that pretty much all uses of iterators can be replaced with for loops. I think this may be true for every use of an iterator, but I can’t be sure there are absolutely no exceptions.

Where we have something like (pseudocode)

goFiles = listFiles()
            .iterator()
            .map(f -> f.absolutePath())
            .filter(p -> p.endsWith(".go"))
            .count()

There are always a bunch of for-loop based strategies that we can take in its place. Trying to keep to a single for loop, e.g. (still pseudocode)

allFiles = listFiles()
goFiles = 0
for int i = 0; i < allFiles.length; i++ {
    f = allFiles[i]

    // plays the part of the map call
    p = f.absolutePath() 
    
    // plays the part of the filter call
    if !p.endsWith(".go") {
        continue
    }

    // plays the part of the count call
    goFiles++
}

In light of this, while I and many others on the internet can tell you for what iterators can be used and possible use cases for them, only you and/or your team can decide which is the right approach for your specific projects and situations.

Of course in some cases there will be hard constraints, such as performance, which may strongly push you or even force you in one direction over the other.

But as we explore some examples in this note, I urge you to think about how the comparable for loop and iterator based approaches would look, and which you think produce more easily understandable and maintanable systems.

Iterator Interfaces in Go 1.23

In the blog post I use as my reference for iterators in Go, the iterators feature is called range over function types.

There are three interfaces here, in the general sense of the word rather than the specific Go interface concept, which can be considered iterators.

I call this a zero value iterator

func(yield func() bool)

this the one value iterator

func(yield func(V) bool)

and this the two value iterator

func(yield func(K, V) bool)

Before we go any further, yes, those single capital letter arguments do mean that we’re in the world of generics. We can’t avoid them when using iterators in Go.

To spell it our explicitly since it’s a little bit to parse on first sight, the latter two iterators types, (one value and two value iterators) are functions returning void which take a single argument called yield

func(yield func(V) bool)
|    ^single argument called yield
^function

func(yield func(K, V) bool)
|    ^single argument called yield
^function

where the yield argument is itself a function, which takes a value (one value iterator) or a key and a value (two value iterator) and returns a boolean

func(yield func(V) bool)
           |    |  ^ returning bool
           |    ^ single value argument, generic type
           ^ function

func(yield func(K, V) bool)
           |    |  |  ^ returning bool
           |    |  ^ value argument, generic type
           |    ^key argument, generic type
           ^function

We’ll get into the detail of what this yield function means, what it does, how it is called, etc… in the later sections.

The one value and two value iterators are represented by actual Go interfaces in the new iter package in Go 1.23.

package iter

type Seq[V any] func(yield func(V) bool)

type Seq2[K, V any] func(yield func(K, V) bool)

This can be useful to know, if you create a function which will take an iterator as an argument, because the interface can be used as the type for that parameter.

Reimplementing For Range Over Slice With 1 Value

Suppose for some reason we didn’t have one value for range over slices built into Go, but we did have iterators.

By one value for range, I mean this

func main() {
    for i := range []string{"Amos", "Alex", "Holden", "Naomi"} {
        fmt.Println(i)
    }
}

When you run this e.g. in Go playground you’ll get output

0
1
2
3

One value for range over a slice or array type gives you as the value the index into the slice or array.

Given the iterator functionality in Go 1.23, we could easily build this behaviour ourselves. Let’s write a function to produce an iterator that will give us this behaviour.

func OneValueSliceIterator[T any](s []T) func(yield func(int) bool) {
    // TODO
    return nil
}

The function is generic with type parameter T to allow iterating over slices containing any type. However, the values it returns to be used in the for loop, which you’ll find in the innermost argument of the return type, is an int - since one value for range over a slice iterates over the slice indexes.

Now we have written out the return type in full to demonstrate, we can use the shorter form, i.e. the interface from the iter package.

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    // TODO
    return nil
}

We need to return a function with the just shown signature

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    return func(yield func(int) bool) {
        // TODO
    }
}

Finally, we can talk about what the yield function represents. We should pass sequentially into the yield function the values which are to be iterated over in the for loop. Explicitly, suppose I have a for loop

for i := range SomeIterator() {
    fmt.Println(i)
}

and in the function returned by SomeIterator, for a certain iteration of the for loop, we call yield(10) - on this iteration of the for loop, i will have the value of 10.

We know that the one value for range over a slice moves sequentially through the indices of the slice. Consequently, let’s set up our iterator function to do just that (pretending for-range is not available to us).

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    return func(yield func(int) bool) {
        i := 0
        for {
            // TODO call yield
            i++
            if (i == len(s)) {
                return
            }
        }
    }
}

Now we can call yield on the index i and we already have a workable iterator!

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    return func(yield func(int) bool) {
        i := 0
        for {
            yield(i)
            i++
            if (i == len(s)) {
                return
            }
        }
    }
}

We can swap this into our for-range

func main() {
    s := []string{"Amos", "Alex", "Holden", "Naomi"}
    for i := range OneValueSliceIterator(s) {
        fmt.Println(i)
    }
}

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    return func(yield func(int) bool) {
        i := 0
        for {
            yield(i)
            i++
            if i == len(s) {
                return
            }
        }
    }
}

and we’ll get the same output

0
1
2
3

Now, how about that bool return value from yield, which has been totally ignored above?

To give us a clue about it’s intended purposes, let’s take a look at what happens if we now call break in the for loop.

func main() {
    s := []string{"Amos", "Alex", "Holden", "Naomi"}
    for i := range OneValueSliceIterator(s) {
        fmt.Println(i)      
        break
    }
}

Running again, we have a problem

0
panic: runtime error: range function continued iteration after function for loop body returned false

goroutine 1 [running]:
main.main-range1(...)
    /tmp/sandbox168180821/prog.go:12
main.main.OneValueSliceIterator[...].func1(...)
    /tmp/sandbox168180821/prog.go:22
main.main()
    /tmp/sandbox168180821/prog.go:12 +0x85

The “for loop body” we can identify with the yield function here. It tells us whether we should stop iteration. Well, more accurately, a false value tells us that we should stop iteration, so really that bool return tells us whether we should keeop going. If yield returns false, we are expected to immediately return.

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    return func(yield func(int) bool) {
        i := 0
        for {
            keepGoing := yield(i)
            if !keepGoing {
                return
            }
            i++
            if i == len(s) {
                return
            }
        }
    }
}

I’ve chosen keepGoing as quite a big variable name here, in contravention to Go’s idiom of using very small names of variables with small scope, because I find it easy to forget that false means stop rather than true! This variable name helps cement in my mind what the return values from yield mean.

Running the for loop with the break with this version of our iterator gives the expected output of just 0.

Anatomy & Patterns Of A Go Iterator

From this basic example, we can already extract a set of elements that most (almost all) Go iterators will have.

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    return func(yield func(int) bool) {
        // 1a. A way to track what we have iterated over
        i := 0
        for {
            // 2. A call to yield with a value
            keepGoing := yield(i)
            // 3. Bail if we're told to stop
            if !keepGoing {
                return
            }
            // 1b. Keeping track of what we have iterated over
            i++
            // 4. Natural stopping point of iteration
            if i == len(s) {
                return
            }
        }
    }
}

Listing those out

  1. (a & b) tracking what we have iterated over (i.e. already yielded)
  2. Calling yield with the value to use for the current iteration
  3. Returning immediately if we are told not to keep going by for
  4. Returning when we have nothing left to iterate over

2 and 3 are really the absolute minimum required to implement an iterator. Without the call to yield, no values will ever “be iterated over” in the for loop; without returning when yield returns false, we have seen that the program will panic.

1 (occasionally) and 4 (very rarely) can be omitted, depending upon the intent of the iterator. We’ll look at some examples later in this note where they are naturally omitted.

Reimplementing For Range Over Slice With 2 Values

We could take the 1 value implementation and modify it in some small ways to make a 2 value implementation. To get some practice implementing the patterns from the previous section, though, we’ll instead follow that blueprint to build a new implementation from scratch.

I’m sure this is obvious to all Gophers, but this is what’s I mean by for range over slice with 2 values

func main() {
    for i, v := range []string{"Amos", "Alex", "Holden", "Naomi"} {
        fmt.Println(i, v)
    }
}

which when run outputs

0 Amos
1 Alex
2 Holden
3 Naomi

The first value is the index of the item in the slice, the second value is the value of the item in the slice.

Okay, step by step, we start by changing the return type of the function that creates the iterator - we’ll be returning index (type int) as before, and additionally value (type parameter T).

func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
    return func(yield func(int, T) bool) {

    }
}
  1. (a & b) tracking what we have iterated over (i.e. already yielded)

This part is exactly the same as before

func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
    return func(yield func(int, T) bool) {
        i := 0
        for {
            i++
        }   
    }
}
  1. Calling yield with the value to use for the current iteration

Small change here because we’re returning both the index and the value from the slice (yield takes two values)

func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
    return func(yield func(int, T) bool) {
        i := 0
        for {
            yield(i, s[i])
            i++
        }   
    }
}
  1. Returning immediately if we are told not to keep going by for

Exactly the same as for the one value iterator

func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
    return func(yield func(int, T) bool) {
        i := 0
        for {
            keepGoing := yield(i, s[i])
            if !keepGoing {
                return
            }
            i++
        }   
    }
}
  1. Returning when we have nothing left to iterate over

Once again, exactly the same as for the one value version.

func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
    return func(yield func(int, T) bool) {
        i := 0
        for {
            keepGoing := yield(i, s[i])
            if !keepGoing {
                return
            }
            i++
            if i == len(s) {
                return
            }
        }   
    }
}

Then, using this in place of the inbuilt for range

func main() {
    for i, v := range TwoValueSliceIterator([]string{"Amos", "Alex", "Holden", "Naomi"}) {
        fmt.Println(i, v)
    }
}

we get exactly the same output

0 Amos
1 Alex
2 Holden
3 Naomi

We did it! If Go had not built in for range over slice in both 1 and 2 value versions, we can now be able to develop that functionality ourselves with iterators!

TODO: discuss the following

- eagerness vs laziness
- application to infinite/unbounded sequences