Reduce all the things

Mon, 30 Nov 2015 #

Reduce all the things

This post is also available in 🇨🇳Chinese Thanks to SwiftGG

Even before Swift was released, iOS / Cocoa developers could use third party frameworks like ObjectiveSugar or ReactiveCocoa in order to gain functional programming constructs like map, flatMap or filter. With Swift, they have become first class language citizens. There are many advantages to using them over a standard for loop. They typically express your intent better, they require less code, and they can be chained together in order to build up complex logic in a clear way.

In this post, I'd like to show another very cool functional addition to Swift which can sometimes be a better solution than map / filter constructs: reduce.

1 A simple Problem

Consider this problem: You're getting a list of persons from a JSON endpoint. You'd like to know the average age from all people living in California. The parsed data looks like this:

let persons: [[String: AnyObject]] = [["name": "Carl Saxon", "city": "New York, NY", "age": 44],
 ["name": "Travis Downing", "city": "El Segundo, CA", "age": 34],
 ["name": "Liz Parker", "city": "San Francisco, CA", "age": 32],
 ["name": "John Newden", "city": "New Jersey, NY", "age": 21],
 ["name": "Hector Simons", "city": "San Diego, CA", "age": 37],
 ["name": "Brian Neo", "age": 27]]

Note the last entry, which omits a city for the person in question. Those cases have to be silently ignored.

The expected result in the example would be 3 persons , as we have three persons from California. Let's try to implement this in Swift in terms of flatMap and filter. The flatMap is used instead of map as flatMap automatically ignores empty optionals. So flatMap([0, nil, 1, 2, nil]) results in [0, 1, 2]. This eases the handling of persons without a proper city.

func infoFromState(state state: String, persons: [[String: AnyObject]]) 
     -> Int {
    return persons.flatMap( { $0["city"]?.componentsSeparatedByString(", ").last })
	   .filter({$0 == state})
	   .count
}
infoFromState(state: "CA", persons: persons)
//#+RESULTS:
//: 3

That's simple enough.

However, now consider the following complication: You'd like to know how many of those persons live in California, and you'd like to know their average age. If we try upgrading the above example, we soon realize that his is a slightly harder problem. There are various solutions, but they all seem to not fit well with functional constructs. A loop-based approach feels much simpler.

When we think about why this does fit, we realize it is because the shape of the data changes. map, flatMap, and filter all keep the shape of the data similar. Array goes in, array goes out. The amount and the contents of the array may change, but the array-shape stays. The problem above, however, requires us to change the shape to a struct or tuple with an Integer average and an Integer sum.

These are the kind of problems where you can apply reduce.

2 Reduce

Reduce is sort of a generalized version of map, flatMap, or filter. The basic idea is to reduce a sequence into a different shape utilizing an accumulator that can keep incremental state. To do this, we hand the function a combinator closure/function/method which is called once for each item in the sequence. This may sound complicated but becomes really easy with a couple of examples.

It is a method on SequenceType and looks like this (simplified):

func reduce<T>(initial: T, combine: (T, Self.Generator.Element) -> T) -> T

So we have an initial value, and we have a closure which expects us to return the same type as the initial value. The final value of the operation is also of the same type as the initial value.

If we take a very simple reduce operation - adding up a list of integers, the evaluation will look like this:

func combinator(accumulator: Int, current: Int) -> Int {
   return accumulator + current
}
[1, 2, 3].reduce(0, combine: combinator)
// The following steps will be performed
combinator(0, 1) { return 0 + 1 } = 1
combinator(1, 2) { return 1 + 2 } = 3
combinator(3, 3) { return 3 + 3 } = 6
= 6

The combinator function will be called once for each item in the list [1, 2, 3]. The state will be kept in the accumulator variable which is just an Integer.

Let's start re-implementing some of our other, trusted, functional programming friends. In order to keep things simple for now, all these functions will operate on Int or Optional<Int>; i.e. we will ignore generics in here. Also, keep in mind that the implementations below exist to explain the behaviour of reduce. The native Swift implementations are usually much faster compared to the reduce versions below¹. Reduce shines in a different set of problems, which will be explained further below.

2.1 Map

func rmap(elements: [Int], transform: (Int) -> Int) -> [Int] {
    return elements.reduce([Int](), combine: { (var acc: [Int], obj: Int) -> [Int] in
       acc.append(transform(obj))
       return acc
    })
}
print(rmap([1, 2, 3, 4], transform: { $0 * 2}))
// [2, 4, 6, 8]

This is a good example to understand the basics of reduce.

First, we're calling reduce on a sequence of elements elements.reduce....
Next, We're giving it the accumulator, i.e. an empty Int array, which will form or return type / result [Int]()
After that, we're handing in the combinator which takes two parameters. The first is the accumulator which we just provided acc: [Int], the second is the current object from our sequence obj: Int.
The actual code in the combinator is simple. We simply transform the obj and append it onto the accumulator. We then return the accumulator.

This looks like much more code than just calling map. That's indeed true, but the version above is extra detailed in order to better explain how reduce works. We can simplify it.

func rmap(elements: [Int], transform: (Int) -> Int) -> [Int] {
    return elements.reduce([Int](), combine: {$0 + [transform($1)]})
}
print(rmap([1, 2, 3, 4], transform: { $0 * 2}))
// [2, 4, 6, 8]

This still works fine. What happened here? We're using the convenient fact that in Swift, the + operator also works for two sequences. So [0, 1, 2] + [transform(4)] takes the left sequence and adds the right sequence, consisting out of the transformed element, to it.

It should be noted that, as of right now, [1, 2, 3] + [4] is slower than [1, 2, 3].append(4). If you operate on huge lists, instead of using collection + collection, you should have a mutable accumulator and mutate it in place:

func rmap(elements: [Int], transform: (Int) -> Int) -> [Int] {
    return elements.reduce([Int](), combine: { (var ac: [Int], b: Int) -> [Int] in 
	ac.append(transform(b))
	return ac
    })
}

In order to better understand reduce we will now go on and also implement flatMap and filter.

func rflatMap(elements: [Int], transform: (Int) -> Int?) -> [Int] {
    return elements.reduce([Int](), 
       combine: { guard let m = transform($1) else { return $0 } 
		  return $0 + [m]})
}
print(rflatMap([1, 3, 4], transform: { guard $0 != 3 else { return nil }; return $0 * 2}))
// [2, 8]

The main difference is that we're adding a guard to make sure the optional contains a value.

2.2 Filter

func rFilter(elements: [Int], filter: (Int) -> Bool) -> [Int] {
    return elements.reduce([Int](), 
       combine: { guard filter($1) else { return $0 } 
		  return $0 + [$1]})
}
print(rFilter([1, 3, 4, 6], filter: { $0 % 2 == 0}))
// [4, 6]

Again, a simple operation. We're leveraging guard again to make sure our filter condition holds.

Up until now, reduce may feel like a more complicated version of map or filter without any major advantages. However, the combinator does not need to be an array. It can be anything. This makes it easy for us to implement various reduction operations in a very simple way.

3 Reduce Examples

Let's start with a favorite of mine, the sum of a list of numbers:

[0, 1, 2, 3, 4].reduce(0, combine: +)
// 10

+ is a valid combinator function as it will just add the lhs and the rhs and return the result, which is specifically the requirement reduce has.

Another example is building the product of a list of numbers:

[1, 2, 3, 4].reduce(1, combine: *)
// 24

Or what about reversing a list:

[1, 2, 3, 4, 5].reduce([Int](), combine: { [$1] + $0 })
// 5, 4, 3, 2, 1

Finally, something a tad more complicated. We'd like to partition a list based on a division criteria

typealias Acc = (l: [Int], r: [Int])
func partition(lst: [Int], criteria: (Int) -> Bool) -> Acc {
   return lst.reduce((l: [Int](), r: [Int]()), combine: { (ac: Acc, o: Int) -> Acc in 
      if criteria(o) {
	return (l: ac.l + [o], r: ac.r)
      } else {
	return (r: ac.r + [o], l: ac.l)
      }
   })
}
partition([1, 2, 3, 4, 5, 6, 7, 8, 9], criteria: { $0 % 2 == 0 })
//: ([2, 4, 6, 8], [1, 3, 5, 7, 9])

The most interesting thing we're doing above is using a tuple as the accumulator. As you'll find out, once you start trying to incorporate reduce into your daily work-flow, tuples are a great way of quickly combining related data during a reduce operation.

4 Reduce vs. Chaining Performance

Apart from the higher flexibility that reduce offers, it has another advantage: Oftentimes, chaining map and filter induces a performance penalty as Swift has to iterate over your collection multiple times in order to generate the required data. Imagine the following code:

[0, 1, 2, 3, 4].map({ $0 + 3}).filter({ $0 % 2 == 0}).reduce(0, combine: +)

Apart from being nonsensical, it is also wasting CPU cycles. The initial sequence will be iterated over 3 times. First to map it, then to filter it, and finally to sum up the contents. Instead, all of this can just as well be implemented as one reduce call, which greatly improves the performance:

[0, 1, 2, 3, 4].reduce(0, combine: { (ac: Int, r: Int) -> Int in 
   if (r + 3) % 2 == 0 {
     return ac + r + 3
   } else {
     return ac
   }
})

Here's a quick benchmark of running both versions and the for-loop version below over an list with 100.000 items:

var ux = 0
for i in Array(0...100000) {
    if (i + 3) % 2 == 0 {
	ux += (i + 3)
    }
}

For Loop: 0.030 seconds

Reduce: 0.034 seconds

Map/Filter: 0.072 seconds

As you can see, the reduce version' performance is very close to the mutating for loop and more than twice as fast as the chaining operation.

However, in other situations, chained operation can greatly outperform reduce. Consider the following example:

Array(0...100000).map({ $0 + 3}).reverse().prefix(3)
// 0.027 Seconds

Array(0...100000).reduce([], combine: { (var ac: [Int], r: Int) -> [Int] in
    ac.insert(r + 3, atIndex: 0)
    return ac
}).prefix(3)
// 2.927 Seconds

Here, we have a stark performance difference of 0.027s for the chained operation vs. 2.927s for the reduce operation, what's happening here?²

foBrowsing on Reddit points out that reduce's semantics means that the parameters to the closure (if mutated) are copied once for every element in the underlying sequence. In our case, this means that the accumulator parameter ac will be copied for each item in our 0…100000 range. A better, more detailed explanation of this can be found in this Airspeedvelocity blog post.

So, when you're replacing a set of operations with reduce, always keep mind whether reduction is indeed a good use case for the situation in question.

We can now go back to our initial count & average problem and try to solve it with reduce.

5 InfoFromState, take two


  func infoFromState(state state: String, persons: [[String: AnyObject]]) 
      -> (count: Int, age: Float) {

      // The type alias in the function will keep the code cleaner
      typealias Acc = (count: Int, age: Float)

      // reduce into a temporary variable
      let u = persons.reduce((count: 0, age: 0.0)) {
	  (ac: Acc, p) -> Acc in

	  // Retrive the state and the age
	  guard let personState = (p["city"] as? String)?.componentsSeparatedByString(", ").last,
		personAge = p["age"] as? Int

	    // make sure the person is from the correct state
	    where personState == state

	    // if age or state are missing, or personState!=state, leave
	    else { return ac }

	  // Finally, accumulate the acount and the age
	  return (count: ac.count + 1, age: ac.age + Float(personAge))
      }

  // our result is the count and the age divided by count
  return (age: u.age / Float(u.count), count: u.count)
}
print(infoFromState(state: "CA", persons: persons))
// prints: (count: 3, age: 34.3333)

As in earlier examples above, we're once again using a tuple to share state in the accumulator. Apart from that, the code is easy to understand.

We also defined a typealias Acc within the func in order to simplify the type annotations a bit.

6 Summary

This was a short overview of the power behind the reduce method. It is particularly helpful if you end up chaining a lot of functional methods together, or when output shape of your data differs from the input shape. I'll end this post with more reduce examples to give you inspirations for various use cases where reduce can easily be applied:

7 More Examples

The following examples display additional reduce use cases. Keep in mind that they're educational in nature. I.e. they exist to stress the usage of reduce, not to be general solutions for your codebase. Most of the examples can be written in a better and faster way (i.e. by using extensions or generics). There are better implementations of these examples in various Swift libraries such as SwiftSequence or Dollar.swift

7.1 Minimum

Return the minimum entry in a given list. Obviously, [1, 5, 2, 9, 4].minElement() would be a better solution.

[1, 5, 2, 9, 4].reduce(Int.max, combine: min)

7.2 Unique

Return a list with all duplicates removed. The better solution would be to use a Set.

[1, 2, 5, 1, 7].reduce([], combine: { (a: [Int], b: Int) -> [Int] in
if a.contains(b) {
   return a
} else {
   return a + [b]
}
})
// prints: 1, 2, 5, 7

7.3 Group By

Go over a list and return a new list with the previous list' items grouped by a discriminator function. The function in question needs to return a Hashable type so that we can differentiate keys. The order of the items will be preserved while the order of the groups won't necessarily be preserved.

func groupby<T, H: Hashable>(items: [T], f: (T) -> H) -> [H: [T]] {
   return items.reduce([:], combine: { (var ac: [H: [T]], o: T) -> [H: [T]] in 
       let h = f(o)
       if var c = ac[h] {
	   c.append(o)
	   ac.updateValue(c, forKey: h)
       } else {
	   ac.updateValue([o], forKey: h)
       }
       return ac
   })
}
print(groupby([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], f: { $0 % 3 }))
// prints: [2: [2, 5, 8, 11], 0: [3, 6, 9, 12], 1: [1, 4, 7, 10]]
print(groupby(["Carl", "Cozy", "Bethlehem", "Belem", "Brand", "Zara"], f: { $0.characters.first! }))
// prints: ["C" : ["Carl" , "Cozy"] , "B" : ["Bethlehem" , "Belem" , "Brand"] , "Z" : ["Zara"]]

7.4 Interpose

This function returns the given items, with element inserted between every count items. The implementation below makes sure that the elements are only interposed and not appended at the end.

func interpose<T>(items: [T], element: T, count: Int = 1) -> [T] {
   typealias Acc = (ac: [T], cur: Int, cnt: Int)
   return items.reduce((ac: [], cur: 0, cnt: 1), combine: { (a: Acc, o: T) -> Acc in 
       switch a {
	  // the last item will not have any interposition
	  case let (ac, cur, _) where (cur+1) == items.count: return (ac + [o], 0, 0)
	  // interpose
	  case let (ac, cur, c) where c == count:
	     return (ac + [o, element], cur + 1, 1)
	  // next
	  case let (ac, cur, c):
	     return (ac + [o], cur + 1, c + 1)
       }
   }).ac
}
print(interpose([1, 2, 3, 4, 5], element: 9))
// : [1, 9, 2, 9, 3, 9, 4, 9, 5]
print(interpose([1, 2, 3, 4, 5], element: 9, count: 2))
// : [1, 2, 9, 3, 4, 9, 5]

7.5 Interdig

This function allows you to combine two sequences by alternately selecting elements from each.

func interdig<T>(list1: [T], list2: [T]) -> [T] {
   return Zip2Sequence(list1, list2).reduce([], combine: { (ac: [T], o: (T, T)) -> [T] in 
	return ac + [o.0, o.1]
   })
}
print(interdig([1, 3, 5], list2: [2, 4, 6]))
// : [1, 2, 3, 4, 5, 6]

7.6 Chunk

This function returns self, broken up into non-overlapping arrays of length n:

func chunk<T>(list: [T], length: Int) -> [[T]] {
   typealias Acc = (stack: [[T]], cur: [T], cnt: Int)
   let l = list.reduce((stack: [], cur: [], cnt: 0), combine: { (ac: Acc, o: T) -> Acc in
      if ac.cnt == length {
	  return (stack: ac.stack + [ac.cur], cur: [o], cnt: 1)
      } else {
	  return (stack: ac.stack, cur: ac.cur + [o], cnt: ac.cnt + 1)
      }
   })
   return l.stack + [l.cur]
}
print(chunk([1, 2, 3, 4, 5, 6, 7], length: 2))
// : [[1, 2], [3, 4], [5, 6], [7]]

This function uses a more complicated accumulator consisting out of a stack, the current list, and the count.

8 Changes

12/01/2015

Fixed the rFlatMap type signature
Added additional explanations for the code examples
Fixed issue where performance differences were attributed to laziness

Footnotes:

For reasons why, have a look at this blog post

In an earlier version of this post I was falsely assuming Swift's laziness feature to be the culprit of the difference. Thanks to this Reddit thread for pointing out my mistake

If you read this far, you should follow me (@terhechte)
on Twitter

AppVenture has been updated

Mon, 30 Nov 2015 #