One of the nice things about the language C# is it’s ability to create generators with the yield keyword. Now, for those of you who do not know what I’m talking about, take a look at the following code:
public IEnumerable<int> MakeGenerator(int num0) { yield return num0; int i = 0; Console.WriteLine("yield 0"); yield return 0; try { i += 1; Console.WriteLine("yield 1"); yield return i; } catch {} Console.WriteLine("end"); }
The code above will return a IEnumerable, and will not do anything else, until the IEnumerable is used. In other words, if I invoke MakeGenerator, none of the code I wrote will actually be run. Now, this might sound completely ridiculous, but I assure you, there is a good reason for this. To explain it as simple as possible, consider the following code, what would you guess the output to be like?
foreach(int num in MakeGenerator(5)) { Console.WriteLine("Got " + num); }
The answer might surprise you. If you run the code above, you will get 6 lines of output, in the following order:
Got 5 yield 0 Got 0 yield 1 Got 1 end
The reason for this is that first when GetNext on the enumerator which the foreach-loop get’s from the IEnumerable is called, the code in the MakeGenerator-method is actually run. However, as you can see, not all of it is run. When we hit any of the yield-statements, the method is returned from. Now, this is a construct that’s actually quite useful in a lot of cases, cause for once, it enables lazy evaluation of your code over enumerables. LINQ for once, relies heavily on this (note; I do not know if LINQ uses yields or customarily created enumerators, but they both facilitate lazy evaluation over enumerables). For instance, if you have a list of a million elements (called myList), and you do myList.Select(CpuHeavyMethod), this will not fry your CPU into oblivion, rather, it will not run the CpuHeavyMethod at all! First when you actually have need for an element passed through the CpuHeavyMethod it will be called.
For those of you who still do not know what a generator is, you can learn more here: http://en.wikipedia.org/wiki/Generator_(computer_programming)
Creating generators
Now that (hopefully) all of you have a basic understanding of what generators are, and how they work, let’s look at how to make them. In other words, let’s look at what the compiler does with the yield keyword. Now, before I start, I’d just like to say that this is most likely different from language to language, and this may or may not be the way C# handles compiling generators (I’ve not looked at how the C#/mono compiler solves the problem, what I’ve read is how the IronPython compiler/interpreter does it. However, I’ve also changed that a bit to simplify).
As it turns out, generating generators is actually rather simple, there are just a few things you have to look out for:
- Try statements (with or without finally) must be handled separately (and with a little bit of caution).
- There cannot be yield-statements in catch or finally-blocks.
Note: these are rules I’ve set for the generator I create. It may be possible to have yields in both catch and finally in C#, though it would require further rewriting of the method, and will not be discussed in this article.
The first thing we need to do (in C# anyways) is to create 2 classes. One for the IEnumerable, and one for the IEnumerator. Also, these two classes will be nested, to enable the inner one to access the outer one’s private variables.
Let’s start with these two completely empty classes (save for the interface-implementations, and _state and _current values which I’ll get back to later).
class MakeGenerator_Generator : IEnumerable<int> { IEnumerator<int> IEnumerable<int>.GetEnumerator() { return new MakeGenerator_Enumerator(this); } IEnumerator IEnumerable.GetEnumerator() { return new MakeGenerator_Enumerator(this); } class MakeGenerator_Enumerator : IEnumerator<int> { private readonly MakeGenerator_Generator _gen; private int _state = -1; private int _current; public MakeGenerator_Enumerator(MakeGenerator_Generator gen) { _gen = gen; } int IEnumerator<int>.Current { get { return _current; } } object IEnumerator.Current { get { return _current; } } bool IEnumerator.MoveNext() { throw new NotImplementedException(); } void IEnumerator.Reset() { throw new NotImplementedException(); } void IDisposable.Dispose() { throw new NotImplementedException(); } } }
The first thing we need to do is to take all the method-parameters given to the MakeGenerator method and make them class-level readonly members of the Generator-class. Then we need to add a constructor that takes the same number of arguments, and saves them to the class members.
class MakeGenerator_Generator : IEnumerable<int> { readonly int param_num0; public MakeGenerator_Generator(int num0) { param_num0 = num0; }
Then we need to copy those to private members of the Enumerator-class during it’s Reset() call, and lastly we need to call Reset() from the constructor like this:
class MakeGenerator_Enumerator : IEnumerator<int> { private readonly MakeGenerator_Generator _gen; private int _state; private int _current; private int val_num0; public MakeGenerator_Enumerator(MakeGenerator_Generator gen) { _gen = gen; ((IEnumerator)this).Reset(); } void IEnumerator.Reset() { val_num0 = _gen.param_num0; }
The next thing we need to do is to copy the entire content of the original method into the MoveNext method of the Enumerator (note, this will make for invalid code, but we’ll fix that in a bit). The result should look like this:
bool IEnumerator.MoveNext() { yield return num0; int i = 0; Console.WriteLine("yield 0"); yield return 0; try { i += 1; Console.WriteLine("yield 1"); yield return i; } catch {} Console.WriteLine("end"); }
Now that we have the base-foundation of the generator, we can start editing the original code to make it work as a generator (in other words, rewrite it to a proper MoveNext method).
What needs to be done in the MoveNext methods is two things for startes. First; all variables needs to be hoisted out to class-level variables, and second, we need to enumerate the yield-statements (starting from 1). Than you change the nth yield-statement into this:
_state = <n>; _current = <yield_value>; return; yield_label_<n>:
The result of doing those two actions looks like this:
bool IEnumerator.MoveNext() { _state = 1; _current = val_num0; return; yield_label_1:; val_i = 0; Console.WriteLine("yield 0"); _state = 2; _current = 0; return; yield_label_2:; try { val_i += 1; Console.WriteLine("yield 1"); _state = 3; _current = i; return; yield_label_3:; } catch {} Console.WriteLine("end"); }
For those of you who have never seen a label in C# before, this code might look a bit confusing. A label in C# is created by typing a name followed by a colon, like “yield_label_0:” The use of a label is with goto, where you jump to a label. The reason I’ve added a semicolon after the label, is simply to add an empty statement you jump to. This would not strictly be necessary, however, if I do not the yield_label_3 would be invalid (as you cannot have a label that goes “off a cliff”, ie. out of a method, if, while, block, etc.). The next part is simply to add a switch over the state in the beginning of the MoveNext method, that jumps to the right continuation. Also, we need to add the final state (_state == 0, finnished). The result looks like this:
bool IEnumerator.MoveNext() { switch(_state) { case 1: goto yield_label_1; case 2: goto yield_label_2; case 3: goto yield_label_3; case 0: goto yield_label_0; } _state = 1; _current = val_num0; return; yield_label_1:; val_i = 0; Console.WriteLine("yield 0"); _state = 2; _current = 0; return; yield_label_2:; try { val_i += 1; Console.WriteLine("yield 1"); _state = 3; _current = i; return; yield_label_3:; } catch {} Console.WriteLine("end"); _state = 0; yield_label_0:; return _state != 0; }
This is basically a working generator. The two first values would work just fine if we simply left it like this (and removed the try), however, since we do have a try (cause I wanted to show how to solve this), this is what we do.
- Start with the outermost try (only do this if the try contain yield-statements).
- Create a new label in front of the try.
- Rewrite all goto-statements that point to a yield inside the try so they point to the new label instead.
- Rewrite the try so it jumps based on state.
In other words, it goes like this:
switch(_state) { case 1: goto yield_label_1; case 2: goto yield_label_2; case 3: goto try_label_1; // 3. Rewrite goto-statement to point to start of try case 0: goto yield_label_0; } _state = 1; _current = val_num0; return; yield_label_1:; val_i = 0; Console.WriteLine("yield 0"); _state = 2; _current = 0; return; yield_label_2:; try_label_1:; // 2. create label in front of the try try { switch(_state) // 4. Rewrite the try so it jumps based on state { case 3: goto yield_label_3; } val_i += 1; Console.WriteLine("yield 1"); _state = 3; _current = i; return; yield_label_3:; } catch {} Console.WriteLine("end"); _state = 0; yield_label_0:; return _state != 0;
And there you have it. We’ve made ourselves a generator. The only thing that needs to be fixed now is to remove the NotImplementedException inside the Dispose-method. What you should probably do inside the Dispose-method is set the _gen-variable to null, and check for it being null in both MoveNext and Current (and if it is, throw a InvalidOperationException because it’s already disposed), however, I’ll leave that up to you guys. Also, I was planing on showing how to create Generators with the DLR, though it seems this post became longer than I planned it, and if I were to start explaining the DLR stuff now, I’d probably more than double the length, so that will come later. I hope this made it easier to understand how generators work, and what the compiler does with them.
Until next time.
- Alxandr
Image may be NSFW.
Clik here to view.
Clik here to view.
