Generators
May 25, 2018 · 8 minute read
TL;DR
- Data generators help us write more succinct code
- They encapsulate the logic needed to create a sequence of values
- A suitable way of implementing a data generator in swift is through the
Sequence
type- Data generators can come in handy in many occasions, for instance during unit testing
Motivation
Often times during testing we need to come up with a set of inputs to test certain functionality. The traditional way to approach this task is to carefully craft a set of values that will test different paths in our code. A test for the case where the input is empty, another one for the case where the input is too large and another one where the input has an invalid value. Assuming we can foresee all the possible corner cases this approach could work quite well.
A simple, yet powerful way of complementing this approach, is to also include a test that generates a set of random inputs to be fed into the code under test. The hope is that these randomly generated cases, would help us uncover scenarios that we did not foresee and for which we did not include a specific test case. In this post, we will explore an elegant way of implementing random input generators for our tests using swift’s Sequence
type.
Let’s suppose we need to test a function that sorts the elements of an array in ascending order.
/// Sorts the given array in ascending order
func sort<T: Comparable>(_ array: [T]) -> [T] {
return array.sorted
}
To make our test code simpler, let’s define a utility function to help us check whether an array is sorted in ascending order.
/// Checks whether the given array is sorted in ascending order
func isSorted<T: Comparable>(_ array: [T]) -> Bool {
guard array.count > 1 else { return true }
for i in 0..<array.count-2 {
guard array[i] <= array[i+1] else { return false }
}
return true
}
With these two functions in hand, we can now focus on implementing our test.
As mentioned before, there are several tests that should be included to thoroughly test a function like sort
. In this post, however, we will focus on using a data generator to create a set of random inputs to test our code.
Our goal is to achieve client code that looks like this:
func testSort() {
for _ in 0..<100 {
// Given
let size = Int(arc4_random(10))
let input = Array(IntGenerator(range: 0..<100).prefix(size))
// When
let sortedInput = sort(input)
// Then
XCTAssertTrue(isSorted(sortedInput), "\(sortedInput) is not sorted")
}
}
We start off by selecting a random size
for the input
array. Then we use the IntGenerator
type to create an array with random integers in the range 0..<100
. In the when
section, we invoke the sort
function passing our randomly generated input to it. Towards the end of the test, we make use of the isSorted
utility function to verify that the sortedInput
has been properly sorted. To make our test more exhaustive, we wrap all the test code inside of a for-loop
, which will cause the test to run a 100 times, using inputs of different sizes and values each time.
Implementation
Now let’s look at the implementation of our data generator: IntGenerator
.
struct IntGenerator: Sequence, IteratorProtocol { }
We start off by declaring conformance to Sequence
and to the IteratorProtocol
. Doing this allows us to treat the generator using the familiar Sequence
API, which includes functions like map
, filter
, forEach
, prefix
, contains
, etc. Strictly speaking, we could achieve this result by simply conforming to Sequence
, but in my opinion conforming also to the IteratorProtocol
leads to a tidier implementation.
Conforming to these two protocols requires us to implement two methods:
Sequence :: makeIterator() -> Self.Iterator
IteratorProtocol :: next() -> Element?
Fortunately, the standard library has a default implementation (1) of the makeIterator
method for types that also conform to the IteratorProtocol
, leaving us with only one required method to implement: next() -> Element?
. According to the documentation, the implementation of this method should be such that invoking it consecutively would return the next element in the sequence and return nil
once the sequence is complete.
Before jumping into the implementation, let’s spend a moment looking at the signature that the next() -> Element?
method will take in our data generator.
struct IntGenerator: Sequence, IteratorProtocol {
mutating func next() -> Int? {
// ...
}
}
The signature specifies Int?
as the return value of the next()
method. This is in line with our goal of producing a sequence of random integers. The return type has to be optional so that we can use nil
to indicate the end of the sequence. This behavior, however, depends entirely on our implementation, allowing for the creation of both finite and infinite sequences. A finite sequence would be a sequence that eventually returns nil
on it’s next()
method; an infinite sequence, on the other hand, would be a sequence that always returns a value.
Another aspect to highlight in the signature of the next()
method is that is declared as mutating
. This is required because our generator is a struct
and because is very common for the next() -> Element?
method to modify the internal state of the type (2).
Now, let’s see a possible implementation of the next()
method for our generator.
struct IntGenerator: Sequence, IteratorProtocol {
mutating func next() -> Int? {
let range: Range<UInt32> = 0..<100
let min: UInt32 = range.lowerBound
let max: UInt32 = range.upperBound
let distance: UInt32 = max - min
let value = arc4random_uniform(distance) + min
return Int(value)
}
}
We define a range with bounds 0..<100
and then use the arc4random_uniform()
API to obtain a random UInt32
value in the desired range. While this implementation will work fine, there is a couple of details that can be tweaked. For example, we would like to give clients the possibility of deciding the range of the generated inputs.
struct IntGenerator: Sequence, IteratorProtocol {
private let range: Range<UInt32>
init(range: Range<UInt32>) {
self.range = range
}
mutating func next() -> Int? {
let min: UInt32 = range.lowerBound
let max: UInt32 = range.upperBound
let distance: UInt32 = max - min
let value = arc4random_uniform(distance) + min
return Int(value)
}
}
Notice how the min
, max
and distance
variables will always have the same values? We can take advantage of this situation and turn them into constant attributes that are derived during the construction of the generator. Also, it seems that for generating the next random number we only need the min
and distance
values, so we might as well keep only these two attributes around.
struct IntGenerator: Sequence, IteratorProtocol {
private let min: UInt32
private let distance: UInt32
init(range: Range<UInt32>) {
self.distance = range.upperBound - range.lowerBound
self.min = range.lowerBound
}
mutating func next() -> Int? {
let value = arc4random_uniform(distance) + min
return Int(value)
}
}
Usage and Limitations
The previous implementation looks much tidier and efficient than our initial approach, and still allows us to write the client code we were aiming for. Specifically:
// ...
let size = Int(arc4_random(10))
let input = Array(IntGenerator(range: 0..<100).prefix(size))
// ...
We can very easily provide different ranges (e.g: 0..<50
, 0...<1_000
, etc), and decide how many values we need in the given interval thanks to the prefix
API. Also, since IntGenerator
conforms to the Sequence
protocol, we get access to other handy APIs like filter
, map
, flatMap
and contains
.
An important caveat of the current implementation, however, is that if we are not careful in the usage we might end up with an infinite execution loop.
Consider the following example:
// This line will be executed infinitely
let infiniteSquares = Array(IntGenerator(range: 0..<100).map { $0 * $0 })
// This line will finish and return the expected value
let finiteSquares = Array(IntGenerator(range: 0..<100).prefix(10).map { $0 *0 })
The problem in this case stems from our implementation of the next()
method. In theory the implementation of this method should return nil
once the sequence is finished. Our implementation, however, always returns a value; thus creating an infinite sequence.
Looking closer at the problem, though, we could say that the real issue lays on eagerly computing all the values in the sequence before passing it to the Array
constructor. From this perspective the solution also seems clearer: figure out a way of lazily computing only the values in the collection that will be actually used. A common way of approaching this task is through the usage of streams, which we will explore in a future post.
- Default Implementation of Sequence type for Iterators [return]
- To understand why, imagine that we are implementing the
next
method in a collection like a Linked List. In this case thenext
method would return the next element in the list, andnil
once the list has been exhausted. A straight forward way of implementing this functionality would be to have anoptional
attribute inside our iterator which would be updated every time thenext
method is called. This update will entail a mutation of the state of our data structure, and will thus require themutating
keyword in case our iterator is astruct
. [return]