Generators

May 25, 2018 · 8 minute read

TL;DR

Data generators help us write more succinct code

They encapsulate the logic needed to create a sequence of values

A suitable way of implementing a data generator in swift is through the Sequence type

Data generators can come in handy in many occasions, for instance during unit testing

Motivation

Often times during testing we need to come up with a set of inputs to test certain functionality. The traditional way to approach this task is to carefully craft a set of values that will test different paths in our code. A test for the case where the input is empty, another one for the case where the input is too large and another one where the input has an invalid value. Assuming we can foresee all the possible corner cases this approach could work quite well.

A simple, yet powerful way of complementing this approach, is to also include a test that generates a set of random inputs to be fed into the code under test. The hope is that these randomly generated cases, would help us uncover scenarios that we did not foresee and for which we did not include a specific test case. In this post, we will explore an elegant way of implementing random input generators for our tests using swift’s Sequence type.

***

Let’s suppose we need to test a function that sorts the elements of an array in ascending order.

/// Sorts the given array in ascending order
func sort<T: Comparable>(_ array: [T]) -> [T] {
    return array.sorted
}

To make our test code simpler, let’s define a utility function to help us check whether an array is sorted in ascending order.

/// Checks whether the given array is sorted in ascending order
func isSorted<T: Comparable>(_ array: [T]) -> Bool {
    guard array.count > 1 else { return true }
    for i in 0..<array.count-2 {
        guard array[i] <= array[i+1] else { return false }
    }
    return true
}

With these two functions in hand, we can now focus on implementing our test.

As mentioned before, there are several tests that should be included to thoroughly test a function like sort. In this post, however, we will focus on using a data generator to create a set of random inputs to test our code.

Our goal is to achieve client code that looks like this:

func testSort() {
    for _ in 0..<100 {
        // Given 
        let size = Int(arc4_random(10))
        let input = Array(IntGenerator(range: 0..<100).prefix(size))

        // When
        let sortedInput = sort(input)

        // Then
        XCTAssertTrue(isSorted(sortedInput), "\(sortedInput) is not sorted")
    }
}

We start off by selecting a random size for the input array. Then we use the IntGenerator type to create an array with random integers in the range 0..<100. In the when section, we invoke the sort function passing our randomly generated input to it. Towards the end of the test, we make use of the isSorted utility function to verify that the sortedInput has been properly sorted. To make our test more exhaustive, we wrap all the test code inside of a for-loop, which will cause the test to run a 100 times, using inputs of different sizes and values each time.

Implementation

Now let’s look at the implementation of our data generator: IntGenerator.

struct IntGenerator: Sequence, IteratorProtocol { }

We start off by declaring conformance to Sequence and to the IteratorProtocol. Doing this allows us to treat the generator using the familiar Sequence API, which includes functions like map, filter, forEach, prefix, contains, etc. Strictly speaking, we could achieve this result by simply conforming to Sequence, but in my opinion conforming also to the IteratorProtocol leads to a tidier implementation.

Conforming to these two protocols requires us to implement two methods:

Sequence :: makeIterator() -> Self.Iterator
IteratorProtocol :: next() -> Element?

Fortunately, the standard library has a default implementation (¹) of the makeIterator method for types that also conform to the IteratorProtocol, leaving us with only one required method to implement: next() -> Element?. According to the documentation, the implementation of this method should be such that invoking it consecutively would return the next element in the sequence and return nil once the sequence is complete.

Before jumping into the implementation, let’s spend a moment looking at the signature that the next() -> Element? method will take in our data generator.

struct IntGenerator: Sequence, IteratorProtocol {
    mutating func next() -> Int? {
        // ...
    }
}

The signature specifies Int? as the return value of the next() method. This is in line with our goal of producing a sequence of random integers. The return type has to be optional so that we can use nil to indicate the end of the sequence. This behavior, however, depends entirely on our implementation, allowing for the creation of both finite and infinite sequences. A finite sequence would be a sequence that eventually returns nil on it’s next() method; an infinite sequence, on the other hand, would be a sequence that always returns a value.

Another aspect to highlight in the signature of the next() method is that is declared as mutating. This is required because our generator is a struct and because is very common for the next() -> Element? method to modify the internal state of the type (²).

Now, let’s see a possible implementation of the next() method for our generator.

struct IntGenerator: Sequence, IteratorProtocol {
    mutating func next() -> Int? {
        let range: Range<UInt32> = 0..<100
        let min: UInt32 = range.lowerBound 
        let max: UInt32 = range.upperBound 
        let distance: UInt32 = max - min 
        let value = arc4random_uniform(distance) + min
        return Int(value)
    }
}

We define a range with bounds 0..<100 and then use the arc4random_uniform() API to obtain a random UInt32 value in the desired range. While this implementation will work fine, there is a couple of details that can be tweaked. For example, we would like to give clients the possibility of deciding the range of the generated inputs.

struct IntGenerator: Sequence, IteratorProtocol {

    private let range: Range<UInt32>

    init(range: Range<UInt32>) {
        self.range = range
    }

    mutating func next() -> Int? {
        let min: UInt32 = range.lowerBound 
        let max: UInt32 = range.upperBound 
        let distance: UInt32 = max - min 
        let value = arc4random_uniform(distance) + min
        return Int(value)
    }
}

Notice how the min, max and distance variables will always have the same values? We can take advantage of this situation and turn them into constant attributes that are derived during the construction of the generator. Also, it seems that for generating the next random number we only need the min and distance values, so we might as well keep only these two attributes around.

struct IntGenerator: Sequence, IteratorProtocol {
    private let min: UInt32
    private let distance: UInt32

    init(range: Range<UInt32>) {
        self.distance = range.upperBound - range.lowerBound
        self.min = range.lowerBound
    }

    mutating func next() -> Int? {
        let value = arc4random_uniform(distance) + min
        return Int(value)
    }
}

Usage and Limitations

The previous implementation looks much tidier and efficient than our initial approach, and still allows us to write the client code we were aiming for. Specifically:

// ...
let size = Int(arc4_random(10))
let input = Array(IntGenerator(range: 0..<100).prefix(size))
// ...

We can very easily provide different ranges (e.g: 0..<50, 0...<1_000, etc), and decide how many values we need in the given interval thanks to the prefix API. Also, since IntGenerator conforms to the Sequence protocol, we get access to other handy APIs like filter, map, flatMap and contains.

An important caveat of the current implementation, however, is that if we are not careful in the usage we might end up with an infinite execution loop.

Consider the following example:

// This line will be executed infinitely
let infiniteSquares = Array(IntGenerator(range: 0..<100).map { $0 * $0 })

// This line will finish and return the expected value
let finiteSquares = Array(IntGenerator(range: 0..<100).prefix(10).map { $0 *0 })

The problem in this case stems from our implementation of the next() method. In theory the implementation of this method should return nil once the sequence is finished. Our implementation, however, always returns a value; thus creating an infinite sequence.

Looking closer at the problem, though, we could say that the real issue lays on eagerly computing all the values in the sequence before passing it to the Array constructor. From this perspective the solution also seems clearer: figure out a way of lazily computing only the values in the collection that will be actually used. A common way of approaching this task is through the usage of streams, which we will explore in a future post.

Default Implementation of Sequence type for Iterators ^[return]
To understand why, imagine that we are implementing the next method in a collection like a Linked List. In this case the next method would return the next element in the list, and nil once the list has been exhausted. A straight forward way of implementing this functionality would be to have an optional attribute inside our iterator which would be updated every time the next method is called. This update will entail a mutation of the state of our data structure, and will thus require the mutating keyword in case our iterator is a struct. ^[return]