Map/Reduce/Filter Algorithm in Java
A look at implementing the Map/Filter/Reduce algorithm in Java using helper classes and the Stream API.
Pre-requisites:
The Problem:
List of users - find the average age of people above the age of 20
- Map: to extract all ages, changes the type of list, not size
- Filter: to extract all ages > 20, changes the size of list, not type
- Reduce: aggregates
Java 7 - Helper Classes
List<Person> list = .....;
List<Integer> ages = Lists.map(list, element -> element.getAge());
List<Integer> agesGT20 = Lists.filter(ages, element -> element > 20);
int sum = Lists.reduce(agesGT20, (a1, a2) -> a1+ a2);
Using static methods on the factory class Lists
(does not exist yet). We push the data and lambdas to the API which does the computation for us. Easy and efficient.
Great parallelization, given that
- associativity holds (not enforced by compiler or JVM)
- The reduction should have an identity element. Not all have though (max = 0 will fail for negative). For reductions not having one,
Optional
s have been introduced which are wrapper types which can be empty.
Caveats
- Duplicates lists -> increases memory conception
Stream API
Generating a stream that consumes the elements.
First Pattern
Prints out numbers > 20.
list.stream()
.map(element -> element.getAge())
.filter(element -> element > 20)
.forEach(System.out::println);
3 different streams are generated at the first 3 steps. This is not a costly operation as the streams do not hold elements -> they simply process them.
Second Pattern
Prints out numbers > 20.
list.stream()
.map(element -> element.getAge())
.peek(System.out::println)
.filter(element -> element > 20)
.forEach(System.out::println);
To print the intermediate result after mapping, forEach
cannot be used as it does not return anything. peek
does the same except that it also returns the stream as received to it in addition to passing the stream elements one by one to the Consumer specified to it.
Third Pattern
list.stream()
.map(element -> element)
.filter(element -> element > 20)
.mapToInt(element -> element)
.average()
mapToInt
is needed because average()
takes an int
Stream, not an Integer
Stream.