Status: In Progress
This note is based on this talk
Golang 1.23 added a “first class” iterator concept. The addition itself, as well as the implementation, was a bit controversial. The talk and this note attempt to demystify some use cases for iterators and to provide a bit of a framework around how to write a custom iterator in Go, without making any value judgements around having iterators in the language or the specific implementation chosen.
An iterator is just a mechanism for “going through” (iterating over) a sequence of values. That’s really it!
Imagine I have some array, slice, set, sequence or vector of elements. A classic example might be a bunch of files on disk to be processed.
// pseudocode - no Go yet
["Dockerfile", "README.md", "src", "test"]
Now imagine we want to go through these elements to operate on them for some reason. We might transform them to some other value. We might choose to keep some and discard others We might run some code on them which results in side effects. All very common operations in programming.
// still pseudocode
for element in iterate_over(["Dockerfile", "README.md", "src", "test"]) {
// transform to another value, e.g. absolutise the path
// filter out items, e.g. remove directories keep only files
// cause some side effect, e.g. delete the file
}
We’ll call the bit between the curly braces here, which is being repeatedly run, the body of the for loop.
In the for loop we are iterating over a sequence of elements. The body is run on each one. The mechanism we use to get the actual elements to be iterated over in a certain order is frequently called an iterator in various programming languages.
To function, this for loop needs to know two things.
These two things correspond to the basic functions of iterators, which usually provide
Before we dive into how iterators work in Go, it’s worth remembering that iterators are very common concept present in most modern programming languages. Since most Gophers have worked with/in programming languages other than Go, it can be interesting/helpful to have these familiar patterns in mind.
Here is how iterators are represented in a few languages that may be familiar to Gophers
| Language | Iterator Concept | Next Element | Finished? |
| ---------- | ----------------------- | --------------- | ---------------------- |
| Java | `java.util.Iterator` | `.next()` | `hasNext() == false` |
| Javascript | "iterator protocol" | `.next().value` | `.next().done` |
| Python | `std::iter::Iterator` | `.next()` | `.next() == None` |
| Rust | `__iter__` / `__next__` | `__next__()` | Raises `StopIteration` |
This is just a selection of languages that I’m familiar with - of course many other languages have these concepts, and I may add to the above as I get acquianted with them.
The approaches to iterators are pretty uniform across these languages, which are all more-or-less ‘C style’ languages which are imperative at heart. We’ll politely ignore Python making the time-honoured faux pas of using exceptions as flow control.
Naturally, functional style languages also include iterator concepts,
such as Elixir’s Enumerable protocol and Clojure’s
ISeq. The functional nature of these languages however mean
that the actual implementations of these look quite different (and in
some cases a lot more complex). In any case, since Go is a ‘C style’
imperative-at-heart language too, it will not benefit us to dig too
deeply into these details.
Before we dive into the details of how to write iterators and when it might make sense to use iterators, it’s always worth remembering that pretty much all uses of iterators can be replaced with for loops. I think this may be true for every use of an iterator, but I can’t be sure there are absolutely no exceptions.
Where we have something like (pseudocode)
goFiles = listFiles()
.iterator()
.map(f -> f.absolutePath())
.filter(p -> p.endsWith(".go"))
.count()
There are always a bunch of for-loop based strategies that we can take in its place. Trying to keep to a single for loop, e.g. (still pseudocode)
allFiles = listFiles()
goFiles = 0
for int i = 0; i < allFiles.length; i++ {
f = allFiles[i]
// plays the part of the map call
p = f.absolutePath()
// plays the part of the filter call
if !p.endsWith(".go") {
continue
}
// plays the part of the count call
goFiles++
}
In light of this, while I and many others on the internet can tell you for what iterators can be used and possible use cases for them, only you and/or your team can decide which is the right approach for your specific projects and situations.
Of course in some cases there will be hard constraints, such as performance, which may strongly push you or even force you in one direction over the other.
But as we explore some examples in this note, I urge you to think about how the comparable for loop and iterator based approaches would look, and which you think produce more easily understandable and maintanable systems.
In the blog post I use as my reference for iterators in Go, the iterators feature is called range over function types.
There are three interfaces here, in the general sense of the word
rather than the specific Go interface concept, which can be
considered iterators.
I call this a zero value iterator
func(yield func() bool)this the one value iterator
func(yield func(V) bool)and this the two value iterator
func(yield func(K, V) bool)Before we go any further, yes, those single capital letter arguments do mean that we’re in the world of generics. We can’t avoid them when using iterators in Go.
To spell it our explicitly since it’s a little bit to parse on first
sight, the latter two iterators types, (one value and two value
iterators) are functions returning void which take a single argument
called yield
func(yield func(V) bool)
| ^single argument called yield
^function
func(yield func(K, V) bool)
| ^single argument called yield
^functionwhere the yield argument is itself a function, which takes a value (one value iterator) or a key and a value (two value iterator) and returns a boolean
func(yield func(V) bool)
| | ^ returning bool
| ^ single value argument, generic type
^ function
func(yield func(K, V) bool)
| | | ^ returning bool
| | ^ value argument, generic type
| ^key argument, generic type
^function
We’ll get into the detail of what this yield function
means, what it does, how it is called, etc… in the later sections.
The one value and two value iterators are represented by actual Go
interfaces in the new iter package in Go
1.23.
package iter
type Seq[V any] func(yield func(V) bool)
type Seq2[K, V any] func(yield func(K, V) bool)This can be useful to know, if you create a function which will take an iterator as an argument, because the interface can be used as the type for that parameter.
Suppose for some reason we didn’t have one value for range over slices built into Go, but we did have iterators.
By one value for range, I mean this
func main() {
for i := range []string{"Amos", "Alex", "Holden", "Naomi"} {
fmt.Println(i)
}
}When you run this e.g. in Go playground you’ll get output
0
1
2
3
One value for range over a slice or array type gives you as the value the index into the slice or array.
Given the iterator functionality in Go 1.23, we could easily build this behaviour ourselves. Let’s write a function to produce an iterator that will give us this behaviour.
func OneValueSliceIterator[T any](s []T) func(yield func(int) bool) {
// TODO
return nil
}The function is generic with type parameter T to allow
iterating over slices containing any type. However, the values it
returns to be used in the for loop, which you’ll find in the innermost
argument of the return type, is an int - since one value
for range over a slice iterates over the slice indexes.
Now we have written out the return type in full to demonstrate, we
can use the shorter form, i.e. the interface from the iter
package.
func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
// TODO
return nil
}We need to return a function with the just shown signature
func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
return func(yield func(int) bool) {
// TODO
}
}Finally, we can talk about what the yield function represents. We should pass sequentially into the yield function the values which are to be iterated over in the for loop. Explicitly, suppose I have a for loop
for i := range SomeIterator() {
fmt.Println(i)
}and in the function returned by SomeIterator, for a certain iteration
of the for loop, we call yield(10) - on this iteration of
the for loop, i will have the value of 10.
We know that the one value for range over a slice moves sequentially through the indices of the slice. Consequently, let’s set up our iterator function to do just that (pretending for-range is not available to us).
func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
return func(yield func(int) bool) {
i := 0
for {
// TODO call yield
i++
if (i == len(s)) {
return
}
}
}
}Now we can call yield on the index i and we already have
a workable iterator!
func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
return func(yield func(int) bool) {
i := 0
for {
yield(i)
i++
if (i == len(s)) {
return
}
}
}
}We can swap this into our for-range
func main() {
s := []string{"Amos", "Alex", "Holden", "Naomi"}
for i := range OneValueSliceIterator(s) {
fmt.Println(i)
}
}
func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
return func(yield func(int) bool) {
i := 0
for {
yield(i)
i++
if i == len(s) {
return
}
}
}
}and we’ll get the same output
0
1
2
3
Now, how about that bool return value from
yield, which has been totally ignored above?
To give us a clue about it’s intended purposes, let’s take a look at what happens if we now call break in the for loop.
func main() {
s := []string{"Amos", "Alex", "Holden", "Naomi"}
for i := range OneValueSliceIterator(s) {
fmt.Println(i)
break
}
}Running again, we have a problem
0
panic: runtime error: range function continued iteration after function for loop body returned false
goroutine 1 [running]:
main.main-range1(...)
/tmp/sandbox168180821/prog.go:12
main.main.OneValueSliceIterator[...].func1(...)
/tmp/sandbox168180821/prog.go:22
main.main()
/tmp/sandbox168180821/prog.go:12 +0x85
The “for loop body” we can identify with the yield function here. It
tells us whether we should stop iteration. Well, more accurately, a
false value tells us that we should stop iteration, so
really that bool return tells us whether we should keeop
going. If yield returns false, we are expected
to immediately return.
func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
return func(yield func(int) bool) {
i := 0
for {
keepGoing := yield(i)
if !keepGoing {
return
}
i++
if i == len(s) {
return
}
}
}
}I’ve chosen keepGoing as quite a big variable name here, in
contravention to Go’s idiom of using very small names of variables with
small scope, because I find it easy to forget that false
means stop rather than true! This variable name helps
cement in my mind what the return values from yield mean.
Running the for loop with the break with this version of
our iterator gives the expected output of just 0.
From this basic example, we can already extract a set of elements that most (almost all) Go iterators will have.
func OneValueSliceIterator[T any](s []T) iter.Seq[int] {
return func(yield func(int) bool) {
// 1a. A way to track what we have iterated over
i := 0
for {
// 2. A call to yield with a value
keepGoing := yield(i)
// 3. Bail if we're told to stop
if !keepGoing {
return
}
// 1b. Keeping track of what we have iterated over
i++
// 4. Natural stopping point of iteration
if i == len(s) {
return
}
}
}
}Listing those out
for2 and 3 are really the absolute minimum required to implement an
iterator. Without the call to yield, no values will ever
“be iterated over” in the for loop; without returning when
yield returns false, we have seen that the
program will panic.
1 (occasionally) and 4 (very rarely) can be omitted, depending upon the intent of the iterator. We’ll look at some examples later in this note where they are naturally omitted.
We could take the 1 value implementation and modify it in some small ways to make a 2 value implementation. To get some practice implementing the patterns from the previous section, though, we’ll instead follow that blueprint to build a new implementation from scratch.
I’m sure this is obvious to all Gophers, but this is what’s I mean by for range over slice with 2 values
func main() {
for i, v := range []string{"Amos", "Alex", "Holden", "Naomi"} {
fmt.Println(i, v)
}
}which when run outputs
0 Amos
1 Alex
2 Holden
3 Naomi
The first value is the index of the item in the slice, the second value is the value of the item in the slice.
Okay, step by step, we start by changing the return type of the
function that creates the iterator - we’ll be returning index (type
int) as before, and additionally value (type parameter
T).
func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
return func(yield func(int, T) bool) {
}
}
- (a & b) tracking what we have iterated over (i.e. already yielded)
This part is exactly the same as before
func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
return func(yield func(int, T) bool) {
i := 0
for {
i++
}
}
}
- Calling yield with the value to use for the current iteration
Small change here because we’re returning both the index and the value from the slice (yield takes two values)
func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
return func(yield func(int, T) bool) {
i := 0
for {
yield(i, s[i])
i++
}
}
}
- Returning immediately if we are told not to keep going by
for
Exactly the same as for the one value iterator
func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
return func(yield func(int, T) bool) {
i := 0
for {
keepGoing := yield(i, s[i])
if !keepGoing {
return
}
i++
}
}
}
- Returning when we have nothing left to iterate over
Once again, exactly the same as for the one value version.
func TwoValueSliceIterator[T any](s []T) iter.Seq2[int, T] {
return func(yield func(int, T) bool) {
i := 0
for {
keepGoing := yield(i, s[i])
if !keepGoing {
return
}
i++
if i == len(s) {
return
}
}
}
}Then, using this in place of the inbuilt for range
func main() {
for i, v := range TwoValueSliceIterator([]string{"Amos", "Alex", "Holden", "Naomi"}) {
fmt.Println(i, v)
}
}we get exactly the same output
0 Amos
1 Alex
2 Holden
3 Naomi
We did it! If Go had not built in for range over slice in both 1 and 2 value versions, we can now be able to develop that functionality ourselves with iterators!
An example of an iterator that is discussed in the blog post announcing the feature is a set. A set in software development basically means a collection of items where each is unique.
If we created a set from an input slice
e.g. [1, 2, 3, 2, 4, 5, 4, 5, 6] the set would contain
[1, 2, 3, 4, 5, 6] (though usually there are no guarantees
around ordering).
You’ve almost certainly had to obtain the set of unique/distinct elements in a collection at some point when progamming Go. The typical solution involves a map, since it’s key are a unique set, something like
// Assume s is a slice of strings
m := make(map[string]struct{})
for _, e := range s {
m[e] = struct{}{}
}
// u will contain the unique elements of s
u := make([]string, 0)
for e, _ := range m {
u = append(u, e)
}There are multiple ways that one could implement an iterator which iterates over the set of elements in a slice, which have different performance characteristics. We’ll take a look at two.
Without making significant changes to the approach in the above section, we can wrap it in a iterator like so (again following the recipe described above)
func EagerSetIterator[T comparable](sl []T) iter.Seq[T] {
m := make(map[T]struct{})
for _, e := range sl {
m[e] = struct{}{}
}
u := make([]T, 0)
for e, _ := range m {
u = append(u, e)
}
return func(yield func(T) bool) {
// TODO
}
}func EagerSetIterator[T comparable](sl []T) iter.Seq[T] {
m := make(map[T]struct{})
for _, e := range sl {
m[e] = struct{}{}
}
u := make([]T, 0)
for e, _ := range m {
u = append(u, e)
}
return func(yield func(T) bool) {
i := 0
for {
i++
}
}
}func EagerSetIterator[T comparable](sl []T) iter.Seq[T] {
m := make(map[T]struct{})
for _, e := range sl {
m[e] = struct{}{}
}
u := make([]T, 0)
for e, _ := range m {
u = append(u, e)
}
return func(yield func(T) bool) {
i := 0
for {
yield(u[i])
i++
}
}
}func EagerSetIterator[T comparable](sl []T) iter.Seq[T] {
m := make(map[T]struct{})
for _, e := range sl {
m[e] = struct{}{}
}
u := make([]T, 0)
for e, _ := range m {
u = append(u, e)
}
return func(yield func(T) bool) {
i := 0
for {
again := yield(u[i])
if !again {
return
}
i++
}
}
}func EagerSetIterator[T comparable](sl []T) iter.Seq[T] {
m := make(map[T]struct{})
for _, e := range sl {
m[e] = struct{}{}
}
u := make([]T, 0)
for e, _ := range m {
u = append(u, e)
}
return func(yield func(T) bool) {
i := 0
for {
again := yield(u[i])
if !again {
return
}
i++
if i == len(u) {
return
}
}
}
}Actually, this is just the one value slice iterator with a bit at the start to compute the set. Easy!
Let’s prove it works
func main() {
sl := []string{
"Amos", "Alex", "Holden", "Naomi", "Holden", "Miller", "Amos",
"Anderson Dawes", "Fred Johnson", "Fred Johnson", "Praxidike Meng",
"Anderson Dawes", "Miller", "Alex", "Praxidike Meng",
"Chrisjen Avasarala", "Julie Mao", "Holden", "Fred Johnson",
}
for e := range EagerSetIterator(sl) {
fmt.Println(e)
}
}Output
Amos
Holden
Anderson Dawes
Julie Mao
Alex
Naomi
Miller
Fred Johnson
Praxidike Meng
Chrisjen Avasarala
Each unique string in the input slice is iterated over exactly once, and the order does not match the order in which those elements appear in the input slice.
Note that the unique set of elements is computed in one batch at the start, before iteration even begins. For this reason, we can say that the set of elements is computed eagerly. More on this after the next example.
We can take a different strategy to how the unique elements are found. Instead of computing the full set of unique elements up front before iteration, we will aim to find them as we go, throughout the iteration.
func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
m := make(map[T]struct{})
return func(yield func(T) bool) {
// TODO
}
}Note: we move where i is incremented in the loop here for reasons we’ll see shortly
func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
m := make(map[T]struct{})
return func(yield func(T) bool) {
i := -1
for {
i++
}
}
}2a. Find the next item that we haven’t seen yet
The use of continue is why we increment at the start of
the loop
func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
m := make(map[T]struct{})
return func(yield func(T) bool) {
i := -1
for {
i++
if _, ok := m[sl[i]]; ok {
continue
}
}
}
}2b. Record that we have now seen this new item
func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
m := make(map[T]struct{})
return func(yield func(T) bool) {
i := -1
for {
i++
if _, ok := m[sl[i]]; ok {
continue
}
m[sl[i]] = struct{}{}
}
}
}2c. Yield this new item to the for loop
func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
m := make(map[T]struct{})
return func(yield func(T) bool) {
i := -1
for {
i++
if _, ok := m[sl[i]]; ok {
continue
}
m[sl[i]] = struct{}{}
yield(sl[i])
}
}
}func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
m := make(map[T]struct{})
return func(yield func(T) bool) {
i := -1
for {
i++
if _, ok := m[sl[i]]; ok {
continue
}
m[sl[i]] = struct{}{}
again := yield(sl[i])
if !again {
return
}
}
}
}The exit condition also has to move to make sure we don’t run over the end of the input slice.
func LazySetIterator[T comparable](sl []T) iter.Seq[T] {
m := make(map[T]struct{})
return func(yield func(T) bool) {
i := -1
for {
i++
if i == len(sl) {
return
}
if _, ok := m[sl[i]]; ok {
continue
}
m[sl[i]] = struct{}{}
again := yield(sl[i])
if !again {
return
}
}
}
}Here, we are computing the set of unique elements (the keys of
m) as move through the input slice sl,
returning each novel element as soon as it is seen.
We should double check it still works as we expect
Amos
Alex
Holden
Naomi
Miller
Anderson Dawes
Fred Johnson
Praxidike Meng
Chrisjen Avasarala
Julie Mao
Now we have the same basic property of a set, in that each element iterated over is unique, but this approach incidentally also gives us for free the property that the elements are iterated over in the same order as the first time that they appear in the input slice.
This solution exhibits lazyness, because we don’t find all of the
unique elements up front, rather we defer computation of (i.e. finding)
the next unique element until just before we yield it to be
used in the for loop.
To demonstrate how lazy vs eager approaches differ in performance characteristics, we’ll need a much bigger input slice. we’ll generate a randomly ordered one.
func RandomHugeSlice() []string {
names := []string{
"Amos", "Alex", "Holden", "Naomi", "Miller", "Anderson Dawes", "Fred Johnson",
"Praxidike Meng", "Chrisjen Avasarala", "Julie Mao",
}
s := make([]string, 1000000)
for i := range s {
s[i] = names[rand.IntN(len(names))]
}
return s
}We’ll also keep track of how long it takes before we print each line
func main() {
sl := RandomHugeSlice()
start := time.Now()
for e := range EagerSetIterator(sl) {
fmt.Println(e, time.Since(start).Microseconds())
}
fmt.Println("Finished at", time.Since(start).Microseconds())
}With the eager iterator, we’ll see something like this (obviously the numbers will change each time, vary from computer to computer, etc…)
Fred Johnson 40198
Julie Mao 40226
Miller 40229
Holden 40232
Naomi 40235
Alex 40237
Anderson Dawes 40240
Praxidike Meng 40244
Amos 40246
Chrisjen Avasarala 40249
Finished at 40252
With the lazy iterator, something like
Amos 2
Holden 48
Alex 52
Naomi 55
Miller 58
Fred Johnson 61
Chrisjen Avasarala 64
Julie Mao 69
Anderson Dawes 81
Praxidike Meng 84
Finished at 43084
For the eager iterator, there’s a “long” wait while the set is computed, then all of the elements are iterated over in very quick succession, then the loop exits very quickly afterwards.
For the lazy iterator, the time to returning the first element is very fast, and then the rest of the elements come out in a bunch at the start*.
I haven’t benchmaked it, but the lazy iterator consistently exits the loop 5-10% later than the eager one overall.
So we might choose different approaches in different scenarios - eager might be a bit faster overall, so we might choose it for higher throughput, but lazy gives you the first element faster and the other elements in “bits” during the iteration, so we might choose it for lower latency.
*They are bunched at the start here because the unique elements are randomly distributed with equal probablity through the input slice. If the elements followed a different distribution (e.g. one element occurred much less commonly than others) we might find a new unique element at any point during the iteration. The time gaps between successive elements could vary wildly, but the first element out will always be much faster than the eager case (in fact, it will always be almost instantaneous).
Your first response to this might be that infinite or unbounded sequences seem like something esoteric, more for mathematicians or theoretical computer science than for the real world of business logic, domains, and other such concrete things. I promise there’ll be a real world example by the end of this section!
Iterators give us a very natural and practical way to generate infinite sequences of elements. Obviously, an infinite sequence is not in pratice expressible via a slice or array.
func Repeat[T any](e T) seq.Iter[T] {
return func(yield func(T) bool) {
for {
again := yield(e)
if !again {
return
}
}
}
}Running something like this
for e := range Repeat(42) {
// Do what you like with 42
}would spin forever, with e always taking the value 42.
Not so useful, right? Plus, we could get the same with
for {
e := 42
// Do what you like with 42
}Suppose we need to generate a random string of hex of specified length. I’m sure you know, but a hex string includes characters 0-9 and a-f (representing values 0-16).
A classic Go idiomatic, very imperative, way to do this might be e.g.
func RandomHexString(length int) string {
hc := []string{
"0", "1", "2", "3", "4", "5", "6", "7", "8",
"9", "a", "b", "c", "d", "e", "f",
}
s := ""
for _ = range length {
i := rand.IntN(16)
s += hc[i]
}
return s
}Calling e.g. fmt.Println(RandomHexString(16)) will
produce an output like ea3f94b68e454ba7.
TODO: discuss the following
- re-implementing for range map - application to infinite/unbounded sequences