Don't Be An Iter Hater!

Golang 1.23 added a “first class” iterator concept. The addition itself, as well as the implementation, was a bit controversial. The talk and this note attempt to demystify some use cases for iterators and to provide a bit of a framework around how to write a custom iterator in Go, without making any value judgements around having iterators in the language or the specific implementation chosen.

What Are Iterators?

An iterator is just a mechanism for “going through” (iterating over) a sequence of values. That’s really it!

Imagine I have some array, slice, set, sequence or vector of elements. A classic example might be a bunch of files on disk to be processed.

Now imagine we want to go through these elements to operate on them for some reason. We might transform them to some other value. We might choose to keep some and discard others We might run some code on them which results in side effects. All very common operations in programming.

We’ll call the bit between the curly braces here, which is being repeatedly run, the body of the for loop.

In the for loop we are iterating over a sequence of elements. The body is run on each one. The mechanism we use to get the actual elements to be iterated over in a certain order is frequently called an iterator in various programming languages.

These two things correspond to the basic functions of iterators, which usually provide

Iterators In Other Languages

Before we dive into how iterators work in Go, it’s worth remembering that iterators are very common concept present in most modern programming languages. Since most Gophers have worked with/in programming languages other than Go, it can be interesting/helpful to have these familiar patterns in mind.

Here is how iterators are represented in a few languages that may be familiar to Gophers

This is just a selection of languages that I’m familiar with - of course many other languages have these concepts, and I may add to the above as I get acquianted with them.

The approaches to iterators are pretty uniform across these languages, which are all more-or-less ‘C style’ languages which are imperative at heart. We’ll politely ignore Python making the time-honoured faux pas of using exceptions as flow control.

Naturally, functional style languages also include iterator concepts, such as Elixir’s Enumerable protocol and Clojure’s ISeq. The functional nature of these languages however mean that the actual implementations of these look quite different (and in some cases a lot more complex). In any case, since Go is a ‘C style’ imperative-at-heart language too, it will not benefit us to dig too deeply into these details.

Iterator/For Loop Interchangeability

Before we dive into the details of how to write iterators and when it might make sense to use iterators, it’s always worth remembering that pretty much all uses of iterators can be replaced with for loops. I think this may be true for every use of an iterator, but I can’t be sure there are absolutely no exceptions.

There are always a bunch of for-loop based strategies that we can take in its place. Trying to keep to a single for loop, e.g. (still pseudocode)

In light of this, while I and many others on the internet can tell you for what iterators can be used and possible use cases for them, only you and/or your team can decide which is the right approach for your specific projects and situations.

Of course in some cases there will be hard constraints, such as performance, which may strongly push you or even force you in one direction over the other.

But as we explore some examples in this note, I urge you to think about how the comparable for loop and iterator based approaches would look, and which you think produce more easily understandable and maintanable systems.

Iterator Interfaces in Go 1.23

In the blog post I use as my reference for iterators in Go, the iterators feature is called range over function types.

There are three interfaces here, in the general sense of the word rather than the specific Go interface concept, which can be considered iterators.

func(yield func() bool)

func(yield func(V) bool)

func(yield func(K, V) bool)

Before we go any further, yes, those single capital letter arguments do mean that we’re in the world of generics. We can’t avoid them when using iterators in Go.

To spell it our explicitly since it’s a little bit to parse on first sight, the latter two iterators types, (one value and two value iterators) are functions returning void which take a single argument called yield

func(yield func(V) bool)
|    ^single argument called yield
^function

func(yield func(K, V) bool)
|    ^single argument called yield
^function

where the yield argument is itself a function, which takes a value (one value iterator) or a key and a value (two value iterator) and returns a boolean

We’ll get into the detail of what this yield function means, what it does, how it is called, etc… in the later sections.

The one value and two value iterators are represented by actual Go interfaces in the new iter package in Go 1.23.

package iter

type Seq[V any] func(yield func(V) bool)

type Seq2[K, V any] func(yield func(K, V) bool)

This can be useful to know, if you create a function which will take an iterator as an argument, because the interface can be used as the type for that parameter.

Reimplementing For Range Over Slice With 1 Value

Suppose for some reason we didn’t have one value for range over slices built into Go, but we did have iterators.

func main() {
    for i := range []string{"Amos", "Alex", "Holden", "Naomi"} {
        fmt.Println(i)
    }
}

One value for range over a slice or array type gives you as the value the index into the slice or array.

Given the iterator functionality in Go 1.23, we could easily build this behaviour ourselves. Let’s write a function to produce an iterator that will give us this behaviour.

func OneValueSliceIterator[T any](s []T) func(yield func(int) bool) {
    // TODO
    return nil
}

The function is generic with type parameter T to allow iterating over slices containing any type. However, the values it returns to be used in the for loop, which you’ll find in the innermost argument of the return type, is an int - since one value for range over a slice iterates over the slice indexes.

Now we have written out the return type in full to demonstrate, we can use the shorter form, i.e. the interface from the iter package.

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    // TODO
    return nil
}

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    return func(yield func(int) bool) {
        // TODO
    }
}

Finally, we can talk about what the yield function represents. We should pass sequentially into the yield function the values which are to be iterated over in the for loop. Explicitly, suppose I have a for loop

for i := range SomeIterator() {
    fmt.Println(i)
}

and in the function returned by SomeIterator, for a certain iteration of the for loop, we call yield(10) - on this iteration of the for loop, i will have the value of 10.

We know that the one value for range over a slice moves sequentially through the indices of the slice. Consequently, let’s set up our iterator function to do just that (pretending for-range is not available to us).

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    return func(yield func(int) bool) {
        i := 0
        for {
            // TODO call yield
            i++
            if (i == len(s)) {
                return
            }
        }
    }
}

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    return func(yield func(int) bool) {
        i := 0
        for {
            yield(i)
            i++
            if (i == len(s)) {
                return
            }
        }
    }
}

func main() {
    s := []string{"Amos", "Alex", "Holden", "Naomi"}
    for i := range OneValueSliceIterator(s) {
        fmt.Println(i)
    }
}

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    return func(yield func(int) bool) {
        i := 0
        for {
            yield(i)
            i++
            if i == len(s) {
                return
            }
        }
    }
}

Now, how about that bool return value from yield, which has been totally ignored above?

To give us a clue about it’s intended purposes, let’s take a look at what happens if we now call break in the for loop.

func main() {
    s := []string{"Amos", "Alex", "Holden", "Naomi"}
    for i := range OneValueSliceIterator(s) {
        fmt.Println(i)      
        break
    }
}

The “for loop body” we can identify with the yield function here. It tells us whether we should stop iteration. Well, more accurately, a false value tells us that we should stop iteration, so really that bool return tells us whether we should keeop going. If yield returns false, we are expected to immediately return.

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    return func(yield func(int) bool) {
        i := 0
        for {
            keepGoing := yield(i)
            if !keepGoing {
                return
            }
            i++
            if i == len(s) {
                return
            }
        }
    }
}

I’ve chosen keepGoing as quite a big variable name here, in contravention to Go’s idiom of using very small names of variables with small scope, because I find it easy to forget that false means stop rather than true! This variable name helps cement in my mind what the return values from yield mean.

Running the for loop with the break with this version of our iterator gives the expected output of just 0.

Anatomy & Patterns Of A Go Iterator

From this basic example, we can already extract a set of elements that most (almost all) Go iterators will have.

func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
    return func(yield func(int) bool) {
        // 1a. A way to track what we have iterated over
        i := 0
        for {
            // 2. A call to yield with a value
            keepGoing := yield(i)
            // 3. Bail if we're told to stop
            if !keepGoing {
                return
            }
            // 1b. Keeping track of what we have iterated over
            i++
            // 4. Natural stopping point of iteration
            if i == len(s) {
                return
            }
        }
    }
}

2 and 3 are really the absolute minimum required to implement an iterator. Without the call to yield, no values will ever “be iterated over” in the for loop; without returning when yield returns false, we have seen that the program will panic.

1 (occasionally) and 4 (very rarely) can be omitted, depending upon the intent of the iterator. We’ll look at some examples later in this note where they are naturally omitted.

Reimplementing For Range Over Slice With 2 Values

We could take the 1 value implementation and modify it in some small ways to make a 2 value implementation. To get some practice implementing the patterns from the previous section, though, we’ll instead follow that blueprint to build a new implementation from scratch.

I’m sure this is obvious to all Gophers, but this is what’s I mean by for range over slice with 2 values

func main() {
    for i, v := range []string{"Amos", "Alex", "Holden", "Naomi"} {
        fmt.Println(i, v)
    }
}

The first value is the index of the item in the slice, the second value is the value of the item in the slice.

Okay, step by step, we start by changing the return type of the function that creates the iterator - we’ll be returning index (type int) as before, and additionally value (type parameter T).

func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
    return func(yield func(int, T) bool) {

    }
}

func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
    return func(yield func(int, T) bool) {
        i := 0
        for {
            i++
        }   
    }
}

Small change here because we’re returning both the index and the value from the slice (yield takes two values)

func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
    return func(yield func(int, T) bool) {
        i := 0
        for {
            yield(i, s[i])
            i++
        }   
    }
}

func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
    return func(yield func(int, T) bool) {
        i := 0
        for {
            keepGoing := yield(i, s[i])
            if !keepGoing {
                return
            }
            i++
        }   
    }
}

func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
    return func(yield func(int, T) bool) {
        i := 0
        for {
            keepGoing := yield(i, s[i])
            if !keepGoing {
                return
            }
            i++
            if i == len(s) {
                return
            }
        }   
    }
}

func main() {
    for i, v := range TwoValueSliceIterator([]string{"Amos", "Alex", "Holden", "Naomi"}) {
        fmt.Println(i, v)
    }
}

We did it! If Go had not built in for range over slice in both 1 and 2 value versions, we can now be able to develop that functionality ourselves with iterators!

Eagerness and Lazyness via Sets

An example of an iterator that is discussed in the blog post announcing the feature is a set. A set in software development basically means a collection of items where each is unique.

If we created a set from an input slice e.g. [1, 2, 3, 2, 4, 5, 4, 5, 6] the set would contain [1, 2, 3, 4, 5, 6] (though usually there are no guarantees around ordering).

You’ve almost certainly had to obtain the set of unique/distinct elements in a collection at some point when progamming Go. The typical solution involves a map, since it’s key are a unique set, something like

// Assume s is a slice of strings
m := make(map[string]struct{})
for _, e := range s {
    m[e] = struct{}{}
}
// u will contain the unique elements of s
u := make([]string, 0)
for e, _ :=  range m {
    u = append(u, e)
}

There are multiple ways that one could implement an iterator which iterates over the set of elements in a slice, which have different performance characteristics. We’ll take a look at two.

Set Iterator - Eager

Without making significant changes to the approach in the above section, we can wrap it in a iterator like so (again following the recipe described above)

func EagerSetIterator[T comparable](sl []T) iter.Seq[T] {
    m := make(map[T]struct{})
    for _, e := range sl {
        m[e] = struct{}{}
    }
    u := make([]T, 0)
    for e, _ :=  range m {
        u = append(u, e)
    }
    return func(yield func(T) bool) {
        // TODO
    }
}

func EagerSetIterator[T comparable](sl []T) iter.Seq[T] {
    m := make(map[T]struct{})
    for _, e := range sl {
        m[e] = struct{}{}
    }
    u := make([]T, 0)
    for e, _ :=  range m {
        u = append(u, e)
    }
    return func(yield func(T) bool) {
        i := 0
        for {
            i++
        }
    }
}

func EagerSetIterator[T comparable](sl []T) iter.Seq[T] {
    m := make(map[T]struct{})
    for _, e := range sl {
        m[e] = struct{}{}
    }
    u := make([]T, 0)
    for e, _ :=  range m {
        u = append(u, e)
    }
    return func(yield func(T) bool) {
        i := 0
        for {
            yield(u[i])
            i++
        }
    }
}

func EagerSetIterator[T comparable](sl []T) iter.Seq[T] {
    m := make(map[T]struct{})
    for _, e := range sl {
        m[e] = struct{}{}
    }
    u := make([]T, 0)
    for e, _ :=  range m {
        u = append(u, e)
    }
    return func(yield func(T) bool) {
        i := 0
        for {
            again := yield(u[i])
            if !again {
                return
            }
            i++
        }
    }
}

func EagerSetIterator[T comparable](sl []T) iter.Seq[T] {
    m := make(map[T]struct{})
    for _, e := range sl {
        m[e] = struct{}{}
    }
    u := make([]T, 0)
    for e, _ :=  range m {
        u = append(u, e)
    }
    return func(yield func(T) bool) {
        i := 0
        for {
            again := yield(u[i])
            if !again {
                return
            }
            i++
            if i == len(u) {
                return
            }
        }
    }
}

Actually, this is just the one value slice iterator with a bit at the start to compute the set. Easy!

func main() {
    sl := []string{
        "Amos", "Alex", "Holden", "Naomi", "Holden", "Miller", "Amos",
        "Anderson Dawes", "Fred Johnson", "Fred Johnson", "Praxidike Meng",
        "Anderson Dawes", "Miller", "Alex", "Praxidike Meng",
        "Chrisjen Avasarala", "Julie Mao", "Holden", "Fred Johnson",
    }
    for e := range EagerSetIterator(sl) {
        fmt.Println(e)
    }
}

Each unique string in the input slice is iterated over exactly once, and the order does not match the order in which those elements appear in the input slice.

Eagerness

Note that the unique set of elements is computed in one batch at the start, before iteration even begins. For this reason, we can say that the set of elements is computed eagerly. More on this after the next example.

Set Iterator - Lazy

We can take a different strategy to how the unique elements are found. Instead of computing the full set of unique elements up front before iteration, we will aim to find them as we go, throughout the iteration.

func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
    m := make(map[T]struct{})
    return func(yield func(T) bool) {
        // TODO
    }
}

Note: we move where i is incremented in the loop here for reasons we’ll see shortly

func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
    m := make(map[T]struct{})
    return func(yield func(T) bool) {
        i := -1
        for {
            i++
        }
    }
}

func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
    m := make(map[T]struct{})
    return func(yield func(T) bool) {
        i := -1
        for {
            i++
            if _, ok := m[sl[i]]; ok {
                continue
            }
        }
    }
}

func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
    m := make(map[T]struct{})
    return func(yield func(T) bool) {
        i := -1
        for {
            i++
            if _, ok := m[sl[i]]; ok {
                continue
            }
            m[sl[i]] = struct{}{}

        }
    }
}

func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
    m := make(map[T]struct{})
    return func(yield func(T) bool) {
        i := -1
        for {
            i++
            if _, ok := m[sl[i]]; ok {
                continue
            }
            m[sl[i]] = struct{}{}
            yield(sl[i])
        }
    }
}

func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
    m := make(map[T]struct{})
    return func(yield func(T) bool) {
        i := -1
        for {
            i++
            if _, ok := m[sl[i]]; ok {
                continue
            }
            m[sl[i]] = struct{}{}
            again := yield(sl[i])
            if !again {
                return
            }
        }
    }
}

The exit condition also has to move to make sure we don’t run over the end of the input slice.

func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
    m := make(map[T]struct{})
    return func(yield func(T) bool) {
        i := -1
        for {
            i++
            if i == len(sl) {
                return
            }
            if _, ok := m[sl[i]]; ok {
                continue
            }
            m[sl[i]] = struct{}{}
            again := yield(sl[i])
            if !again {
                return
            }
        }
    }
}

Here, we are computing the set of unique elements (the keys of m) as move through the input slice sl, returning each novel element as soon as it is seen.

Now we have the same basic property of a set, in that each element iterated over is unique, but this approach incidentally also gives us for free the property that the elements are iterated over in the same order as the first time that they appear in the input slice.

Lazyness

This solution exhibits lazyness, because we don’t find all of the unique elements up front, rather we defer computation of (i.e. finding) the next unique element until just before we yield it to be used in the for loop.

Why Might This Matter?

To demonstrate how lazy vs eager approaches differ in performance characteristics, we’ll need a much bigger input slice. we’ll generate a randomly ordered one.

func RandomHugeSlice() []string {
    names := []string{
        "Amos", "Alex", "Holden", "Naomi", "Miller", "Anderson Dawes", "Fred Johnson",
        "Praxidike Meng", "Chrisjen Avasarala", "Julie Mao",
    }
    s := make([]string, 1000000)
    for i := range s {
        s[i] = names[rand.IntN(len(names))]
    }
    return s
}

func main() {
    sl := RandomHugeSlice()
    start := time.Now()
    for e := range EagerSetIterator(sl) {
        fmt.Println(e, time.Since(start).Microseconds())
    }
    fmt.Println("Finished at", time.Since(start).Microseconds())
}

With the eager iterator, we’ll see something like this (obviously the numbers will change each time, vary from computer to computer, etc…)

For the eager iterator, there’s a “long” wait while the set is computed, then all of the elements are iterated over in very quick succession, then the loop exits very quickly afterwards.

For the lazy iterator, the time to returning the first element is very fast, and then the rest of the elements come out in a bunch at the start*.

I haven’t benchmaked it, but the lazy iterator consistently exits the loop 5-10% later than the eager one overall.

So we might choose different approaches in different scenarios - eager might be a bit faster overall, so we might choose it for higher throughput, but lazy gives you the first element faster and the other elements in “bits” during the iteration, so we might choose it for lower latency.

*They are bunched at the start here because the unique elements are randomly distributed with equal probablity through the input slice. If the elements followed a different distribution (e.g. one element occurred much less commonly than others) we might find a new unique element at any point during the iteration. The time gaps between successive elements could vary wildly, but the first element out will always be much faster than the eager case (in fact, it will always be almost instantaneous).

Infinite/Unbounded Sequences

Your first response to this might be that infinite or unbounded sequences seem like something esoteric, more for mathematicians or theoretical computer science than for the real world of business logic, domains, and other such concrete things. I promise there’ll be a real world example by the end of this section!

Iterators give us a very natural and practical way to generate infinite sequences of elements. Obviously, an infinite sequence is not in pratice expressible via a slice or array.

func Repeat[T any](e T) seq.Iter[T] {
    return func(yield func(T) bool) {
        for {
            again := yield(e)
            if !again {
                return
            }
        }
    }
}

for e := range Repeat(42) {
    // Do what you like with 42
}

would spin forever, with e always taking the value 42. Not so useful, right? Plus, we could get the same with

for {
    e := 42
    // Do what you like with 42
}

Generating A Random Hex String

Suppose we need to generate a random string of hex of specified length. I’m sure you know, but a hex string includes characters 0-9 and a-f (representing values 0-16).

func RandomHexString(length int) string {
    hc := []string{
        "0", "1", "2", "3", "4", "5", "6", "7", "8",
        "9", "a", "b", "c", "d", "e", "f",
    }
    s := ""
    for _ = range length {
        i := rand.IntN(16)
        s += hc[i]
    }
    return s
}

Calling e.g. fmt.Println(RandomHexString(16)) will produce an output like ea3f94b68e454ba7.

Don’t Be An Iter-hater!