Map/Reduce/Filter Algorithm in Java

A look at implementing the Map/Filter/Reduce algorithm in Java using helper classes and the Stream API.

Pre-requisites:

  1. Lambda Expressions in Java
  2. Stream API in Java

The Problem:

List of users - find the average age of people above the age of 20

  • Map: to extract all ages, changes the type of list, not size
  • Filter: to extract all ages > 20, changes the size of list, not type
  • Reduce: aggregates

Java 7 - Helper Classes

List<Person> list = .....;
List<Integer> ages = Lists.map(list, element -> element.getAge());
List<Integer> agesGT20 = Lists.filter(ages, element -> element > 20);
int sum = Lists.reduce(agesGT20, (a1, a2) -> a1+ a2);

Using static methods on the factory class Lists (does not exist yet). We push the data and lambdas to the API which does the computation for us. Easy and efficient. Great parallelization, given that

  1. associativity holds (not enforced by compiler or JVM)
  2. The reduction should have an identity element. Not all have though (max = 0 will fail for negative). For reductions not having one, Optionals have been introduced which are wrapper types which can be empty.

Caveats

  1. Duplicates lists -> increases memory conception

Stream API

Generating a stream that consumes the elements.

First Pattern

Prints out numbers > 20.

list.stream()
.map(element -> element.getAge())
.filter(element -> element > 20)
.forEach(System.out::println);

3 different streams are generated at the first 3 steps. This is not a costly operation as the streams do not hold elements -> they simply process them.

Second Pattern

Prints out numbers > 20.

list.stream()
.map(element -> element.getAge())
.peek(System.out::println)
.filter(element -> element > 20)
.forEach(System.out::println);

To print the intermediate result after mapping, forEach cannot be used as it does not return anything. peek does the same except that it also returns the stream as received to it in addition to passing the stream elements one by one to the Consumer specified to it.

Third Pattern

list.stream()
    .map(element -> element)
    .filter(element -> element > 20)
    .mapToInt(element -> element)
    .average()

mapToInt is needed because average() takes an int Stream, not an Integer Stream.