Advanced Functional Programming for OO/Java Developers, Part 2: Functional Data Structures & More Advanced Continuations

This is the second part of a series of posts introducing functional programming principles to OO developers. In the first part, we looked at two design patterns that are less common in the OO world and more common in the functional world, namely the visitor design pattern and the continuation passing style design pattern. These patterns, especially the latter one, are crucial for functional programming, but can be very useful outside that domain. For example, developers dabbling in large data amounts will have heard of stream, reactive and asynchronous programming, all of which benefit greatly from understanding some functional principles, and even in development of “standard business applications” we saw a tangible improvement by replacing what an OO developer would intuitively write with a CPS/visitor implementation, improving the execution time from O(n²) to O(n).

In this second bit, we’ll first take what looks like a detour into functional data types, that is data types that are immutable yet still support sophisticated operations, and then use a developed data type to implement functionality on top of the running example of expressions from part 1 that would be very hard to do without continuations.

Functional Data Structures

OO developers will be intimately familiar with abstract data types, at least indirectly via collections libraries. Abstract data types implement a given set of operations without forcing a developer to worry about their inner workings. Classical examples of abstract data types are lists (add element, remove element, traverse all elements) and maps (associate element with key, remove association for key, look up element by key). Abstract data types can be implemented in different ways. For example, Java provides implementations of lists using arrays and linked lists. They support the same base operations and optionally some extra operations, but the same operation may have different performance characteristics for different implementations (for example, an ArrayList provides constant-time random lookup of elements, while a LinkedList provides constant time insertion at both the beginning and end of the list).

Most OO developers will be aware of the dichotomy between basic types and object types (or reference types). If a pass a basic type (like an integer) to a function, the function cannot change the value of the variable at the caller, but if I pass in a reference to an object type the callee can do that (for example by adding an element to a list). The same distinction is in place for final variables: you cannot change a final integer variable but you can add an element to a list contained in a final variable; this distinction is especially important when dealing with anonymous functions for reasons that are equal parts technical, historical, and dumb.

This dichotomy is entirely foreign to a functional developer, because data types cannot be changed. It is not possible to “add an element to a list,” but rather you can “create a new list with an element added.” In (purely) functional languages all types are immutable. Immutable types are known, albeit not necessarily very consciously, by OO developers as well. For example, in Java the type String is immutable. It is not possible to append data to a string, but it is possible to construct a new string with data appended. In fact, the toString example from part 1 is an example where using an immutable type (String) leads to worse performance than using a mutable type (the StringBuilder (or older StringBuffer) are Java’s mutable strings).

Immutable data types have some disadvantages and some advantages, so it is important to know which to use. The biggest disadvantage of immutable data types is that it can be harder to make them work efficiently. For example, adding a short string to a very long string can be done in time proportional to the short string if we allow modifying internal state, but must copy both strings if done in an immutable manner ((This is not entirely true; it is possible to implement string concatenation in an immutable manner without copying by introducing some complexity.)). The biggest advantage is that they cannot be changed. This makes it completely safe to share them between different threads because it is never possible for one thread to make modifications that interfere with the operation of others. This makes usage so much simpler: developers do not have to worry about explicit synchronization, replay errors, whether it is necessary to copy a data structure, etc. All of this means that unless there is a very strong reason to use a mutable type, immutable types should be preferred, also in languages where it is not required, so understanding what they are and when to use them is useful for an OO developer, and the functional domain has plenty of lessons to teach, because they typically only come with functional data structures out of necessity.

Here we’ll look at how to make a functional map data structure. The map will have functional operations and not rely on mutable structures like arrays for implementation. This gives the implementation some interesting performance characteristics. Let’s dive straight into the implementation:

	static class FunctionMap {
		public static <K, V> Function<K, V> empty() {
			return k -> {
				throw new IllegalArgumentException();
			};
		}

		public static <K, V> Function<K, V> put(Function<K, V> map, K key, V value) {
			return k -> Objects.equals(key, k) ? value : map.apply(k);
		}
	}

An no, I kid you not, this is all that is needed to implement a very simple map data structure. We can create new empty maps using the empty operation, and we can put an element into an existing map using the put operation. We do not create out own type, but just leverage the built-in Java Function interface. Worth noting is that both operations are static methods and that the put operation explicitly takes a map as both input and output; we never modify a map, we always consume one map and produce another one.

Here is a simple test of the newly created map:

		Function<String, Integer> map = FunctionMap.empty();

		assertThrows(IllegalArgumentException.class, () -> map.apply("hello"));

		Function<String, Integer> map1 = FunctionMap.put(map, "hello", 7);
		assertEquals(7, map1.apply("hello"));

		Function<String, Integer> map2 = FunctionMap.put(map1, "world", 5);
		assertEquals(7, map2.apply("hello"));
		assertEquals(5, map2.apply("world"));

		Function<String, Integer> map3 = FunctionMap.put(map2, "hello", 3);
		assertEquals(3, map3.apply("hello"));
		assertEquals(5, map3.apply("world"));
		assertEquals(7, map2.apply("hello"));
		assertEquals(5, map2.apply("world"));

We first create an instance of an empty map (l. 1) and check we get an error when looking up a non-existing key (which is all keys for an empty map, l. 3). We then add a value to the map (l. 5), producing a new derived map, map1, and check the value really exists and has the expected value (l. 6). We add another value to map1 (l. 8), yielding map2, and check the two inserted values (ll. 9-10). Basically, so far our map works like a map even without a significantly more complex usage than a standard Java Map.

But that’s when it becomes interesting. We then override a value in map2 (l. 12), producing map3. We then check the value has been updated (ll. 13-14). But interestingly, we can also still check the old values in map2 still remain (ll. 15-16). Adding values to map2 indeed did not change map2 even though map3 clearly reflects the changes. So, if we had passed map2 off to another thread for processing, altering it locally would not break anything.

Exercise: Try explaining in details how this new map works.

Hint 1

Try describing the object structure of the constructed function objects. Remember that Java implicitly creates fields for values from the surroundings that are used in the function.

Hint 2

The empty operation uses no external values. The put operation uses three values from the environment: the key, the element value, and the list to insert into.

Hint 3

Don’t forget that maps are just functions, which are just objects…

Answer

A map becomes a simple singly linked lists of function objects. Each function object has a key + an element value, and a next pointer to the rest of the map.

Looking up a value consists of comparing the given key value with the stored key value; if they are the same, the stored value is returned. Otherwise, the rest of the map is consulted in the same way. An empty map is one that always raises an exception: it contains no values, so it will never contain the requested key.

In fact, once we start really using the map, it is no longer just a single linked list, but an entire tree of maps. We could construct a map4 by putting value “hello”, 11 into map2, and then “hello” would map to three different values in three different maps: map2: 7, map3: 3, map4: 11. Both map3 and map4 would refer to map2, which is again not a problem as adding elements to either will never change it.

This implementation gives our map some interesting performance characteristics. A normal map can put an element in (amortized) constant time (HashMap) or logarithmic time (TreeMap), look up an element in constant time (HashMap) or logarithmic time (TreeMap), and make an independent copy of the map in linear time. Our map instead allows putting an element in constant time, looking up an element in linear time ((In some case even worse if we add/remove a lot of elements.)), but it allows us to make an independent copy in constant time. We can copy our map simply by passing a copy of the reference; as it is immutable, we never need to copy it. The fact that lookup is slow can be mitigated somewhat using a more sophisticated implementation, but it is also less important in some cases; the example below will make use of this map as it is an ideal fit for that usage: we’ll make copies of the map more often than we’ll look up elements.

Exercise: Try extending the map with a remove operation.

Answer

		public static <K, V> Function<K, V> remove(Function<K, V> map, K key) {
			return k -> {
				if (Objects.equals(key, k)) {
					throw new IllegalArgumentException();
				} else {
					return map.apply(k);
				}
			};
		}

Exercise: Try making a functional map with constant time add operation, constant time remove operation, and constant time copy operation.

Hint 1

For this one, you’ll need to create your own interface with two operations (add + remove).

Hint 2

Consider the dynamic structure of the functional map and consider if this could be useful for implementing lists.

Hint 3

Your add operation will be quite straightforward: take a list and an element, and return a new list. Your remove operation will be a bit more involved: it will take a list as an argument, but return both a new list and the removed element.

Hint 4

Consider the operation of a stack, but flipped around. Now implement that without using an array for storage, but instead a linked list. And instead of a list of objects, it is a list of functions.

Hint 5

You could make it so it passes the below tests:

		FunctionalList<Integer> empty = FunctionList.empty();

		FunctionalList<Integer> list = FunctionList.add(empty, 3);
		assertEquals(3, FunctionList.remove(list).getB());

		FunctionalList<Integer> list1 = FunctionList.add(empty, 5);
		FunctionalList<Integer> list2 = FunctionList.add(list1, 7);
		Pair<FunctionalList<Integer>, Integer> result = FunctionList.remove(list2);
		assertEquals(7, result.getB());
		assertEquals(5, FunctionList.remove(result.getA()).getB());
		assertEquals(5, FunctionList.remove(list1).getB());
		
		FunctionalList<Integer> list3 = FunctionList.add(list1, 11);
		assertEquals(11, FunctionList.remove(list3).getB());
		assertEquals(7, FunctionList.remove(list2).getB());
		assertEquals(5, FunctionList.remove(list1).getB());

Answer

One implementation in plain Java could be:

	private static class Pair<A, B> {
		private final A a;
		private final B b;

		public Pair(A a, B b) {
			this.a = a;
			this.b = b;
		}

		public A getA() {
			return a;
		}

		public B getB() {
			return b;
		}
	}

	private static class FunctionalList<E> extends Pair<FunctionalList<E>, E> {
		public FunctionalList(FunctionalList<E> a, E b) {
			super(a, b);
		}
	}

	private static class FunctionList {
		public static <E> FunctionalList<E> empty() {
			return new FunctionalList<E>(null, null) {
				public FunctionalList<E> getA() {
					throw new IllegalArgumentException();
				}

				public E getB() {
					throw new IllegalArgumentException();
				}
			};
		}

		public static <E> FunctionalList<E> add(FunctionalList<E> list, E element) {
			return new FunctionalList<E>(list, element);
		}

		public static <E> Pair<FunctionalList<E>, E> remove(FunctionalList<E> list) {
			return new Pair<FunctionalList<E>, E>(list.getA(), list.getB());
		}
	}

Functional lists may seem very simple and mostly an interesting intellectual exercise, but they constitute the very foundation of many/most functional languages (the L in LISP is literally for List).

Continuations and More Advanced Expressions

In part 1, we discussed simple expressions built from constants, addition and multiplication. That’s fine if you’re in third grade, but we want to go a bit further. We want our expressions to also be able to deal with variables. Variables allow us to store a value under a name and retrieve it later. This can be useful for reusing a computed value more than once (e.g., to compute the square of (5 * 7) + 3) or for generating generic computations that can be used with multiple parameters (e.g., computing the area of a rectangle as height * width).

To do that, we need to introduce two new expression types: a Variable and an Assignment:

	class Variable<T> extends FancyExpression<T> {
		String name;

		public Variable(String name) {
			this.name = name;
		}

		@Override
		public <V> V visit(ExpressionVisitor<T, V> visitor) {
			return visitor.visit(this, name);
		}
	}

	class Assignment<T> extends FancyExpression<T> {
		String name;
		Expression<T> value, next;

		public Assignment(String name, Expression<T> value, Expression<T> next) {
			this.name = name;
			this.value = value;
			this.next = next;
		}

		@Override
		public <V> V visit(ExpressionVisitor<T, V> visitor) {
			return visitor.visit(this, name, value.visit(visitor), next.visit(visitor));
		}
	}

As before, these classes contain no functionality except implementation of the Visitor design pattern. We need a bit of glue to make this interact seamlessly with the classes we developed last time. In particular, we need an extended visitor that can also handle the two new expressions:

	interface ExpressionVisitor<T, V> extends SimpleExpressionVisitor<T, V> {
		V visit(Variable<T> variable, String name);

		V visit(Assignment<T> assignment, String name, V value, V next);
	}

We have just extended the visitor interface from last time, so it is possible to use both the simpler and more sophisticated version, depending on whether we need the new variable functionality. We also need a bit of glue to tie the expression types together:

	abstract class FancyExpression<T> extends Expression<T> {
		public abstract <V> V visit(ExpressionVisitor<T, V> visitor);

		@Override
		public <V> V visit(SimpleExpressionVisitor<T, V> visitor) {
			if (visitor instanceof ExpressionVisitor<?, ?>) {
				return visit((ExpressionVisitor<T, V>) visitor);
			}
			throw new IllegalArgumentException();
		}
	}

The FancyExpression is not the most elegant, but it allows us to extend the expression hierarchy and visitor without having to alter the original classes. This is useful if we do not own or have the source code to the original classes and illustrates how it is possible to extend visitor hierarchies.

Now, we want to be able to evaluate our new fancy expressions, essentially implement the IntegerAlgebra from part 1, but also taking variables into account. For simple expressions we did that directly, just like we did a direct (and inefficient) toString, but to support variables, we need to use continuations. Let’s first look at the code as a whole, and then break it down:

	class FancyIntegerAlgebra
			implements ExpressionVisitor<Integer, Function<Function<String, Integer>, Integer>> {

		@Override
		public Function<Function<String, Integer>, Integer> visit(Constant<Integer> constant, Integer value) {
			System.out.println("visit constant");
			return env -> {
				System.out.println("evaluate constant");
				return value;
			};
		}

		@Override
		public Function<Function<String, Integer>, Integer> visit(Plus<Integer> plus,
				Function<Function<String, Integer>, Integer> left, Function<Function<String, Integer>, Integer> right) {
			System.out.println("visit plus");
			return env -> {
				System.out.println("evaluate plus");
				return left.apply(env) + right.apply(env);
			};
		}

		@Override
		public Function<Function<String, Integer>, Integer> visit(Times<Integer> times,
				Function<Function<String, Integer>, Integer> left, Function<Function<String, Integer>, Integer> right) {
			System.out.println("visit times");
			return env -> {
				System.out.println("evaluate times");
				return left.apply(env) * right.apply(env);
			};
		}

		@Override
		public Function<Function<String, Integer>, Integer> visit(Variable<Integer> variable, String name) {
			System.out.println("visit variable");
			return env -> {
				System.out.println("evaluate variable");
				return env.apply(name);
			};
		}

		@Override
		public Function<Function<String, Integer>, Integer> visit(Assignment<Integer> assignment, String name,
				Function<Function<String, Integer>, Integer> value, Function<Function<String, Integer>, Integer> next) {
			System.out.println("visit assignment");
			return env -> {
				System.out.println("evaluate assignment");
				return next.apply(FunctionMap.put(env, name, value.apply(env)));
			};
		}
	}

This can seem daunting, but a large part of the code should not be too surprising if you have followed along so far. Our visitor implements the 5 required methods for the extended interface. Instead of returning just integers, we instead return a continuation returning an integer (we’ll get to the parameter to the continuation later). With that in mind, the first three method should be relatively straightforward: we just return the stored constant value, the sum, and the product but wrapped in a continuation. Furthermore, we add a bunch logging that will come in handy in part 3.

To evaluate a variable, we need to talk about the parameter to the continuation. It is a function from string to integers, but really it is an instance of the functional map we developed in the previous section. This map is called an environment and is assumed to contain values for all variables. With this in mind, evaluating a variable simply comprises looking up the variable name in the environment and returning the value.

With this in place, assigning values to variables becomes nearly trivial. Conceptually, we just need to store the value in the environment and evaluate the expression with the new environment. This is where the continuation becomes necessary. We need to perform an operation both before, in the middle of, and after calling recursively. The one-liner above belies the complexity of what really happens here, so let’s split it up and consider each part in isolation:

return next.apply(
     FunctionMap.put(env, name, 
          value.apply(env)
     )
);

Remember that expressions are evaluated inside out in Java, so first to be executed is line 3. Here, we use the continuation for the expression we want to store. Since this is a continuation, we need to call it with an environment to get the actual value. We call it with the original environment we ourselves are given. After computing the value to store, we store it in a functional map (ll. 2 + 4); we can think of this as the new environment where we have stored the result of our evaluation. Finally, we then evaluate the actual expression using this new environment.

What we have implemented is not assignment in the sense that a Java developer is familiar with. Rather, this is an implementation of a scoped assignment or value binding. This type of assignment will be familiar to XSLT developers as the xsl:variable or to SML/Swift developers as the let assignments. In Java this most closely resembles a final variable of an immutable type.

Advanced Aside

More advanced readers will recognize this as a lambda abstraction with an immediate function application. We do not do beta reduction, but evaluate everything down to values before function application.

It is also worth noticing that we explicitly use the term expression. More advanced users of OO languages will be familiar with the distinction between statements and expressions, such as the difference between the if statement (if <expression> then <statement1> else <statement2>) and the if expression (<expression>?<expression1>:<expression2>). (Purely) functional languages do not have such a distinction; since statements have no side effects, they are always equivalent to expressions. Our expressions are purely functional, and therefore the only type of assignment that makes sense is a scoped one: there will never be a “next statement” for which the assignment could have any meaning, so we must explicitly create a scope in which the variable makes sense.

We’ll get to using the new evaluation in just a second, but before that we note that while FancyIntegerAlgebra reimplements all visitor methods, this is not always necessary. For example, we can relatively easily implement toString by building on what we made last time:

	class FancyExtendedToString<T> extends FancyToString<T>
			implements ExpressionVisitor<T, Function<StringBuilder, StringBuilder>> {
		@Override
		public Function<StringBuilder, StringBuilder> visit(Variable<T> variable, String name) {
			return sb -> {
				sb.append(name);
				return sb;
			};
		}

		@Override
		public Function<StringBuilder, StringBuilder> visit(Assignment<T> assignment, String name,
				Function<StringBuilder, StringBuilder> value, Function<StringBuilder, StringBuilder> next) {
			return sb -> {
				sb.append('(');
				sb.append(name);
				sb.append(" := ");
				sb = value.apply(sb);
				sb.append("; ");
				sb = next.apply(sb);
				sb.append(')');
				return sb;
			};
		}
	}

Here we have only implemented the two new methods for the extended expression. This is possible because the original implementation already used a continuation. The actual implementation of the extended version should be straightforward at this point, so we’ll not go into further details, but it is worth noticing how we can relatively easily extend an existing object hierarchy, both with new classes and with new functionality even without access to the original implementations. This shows some of the power of the visitor design pattern. If we had not used a visitor here, we would have to alter (or adapt/extend somehow) the original classes if functionality was added as class methods, or we would have to copy the implementation of an efficient toString implementation if it was implemented outside the domain classes.

Now, let’s take a look at using our new expression evaluation:

		// For simple expressions this works as before
		Expression<Integer> integerExpression = new Plus<>(new Times<>(new Constant<>(5), new Constant<>(7)),
				new Constant<>(3));
		System.out
				.println(integerExpression.visit(new FancyExtendedToString<>()).apply(new StringBuilder()).toString());
		System.out.println(integerExpression.visit(new FancyIntegerAlgebra()).apply(FunctionMap.empty()));

		// We can now create variables
		Expression<Integer> variableExpression = new Assignment<>("a", new Constant<>(5),
				new Plus<>(new Variable<>("a"), new Constant<>(7)));
		System.out
				.println(variableExpression.visit(new FancyExtendedToString<>()).apply(new StringBuilder()).toString());
		System.out.println(variableExpression.visit(new FancyIntegerAlgebra()).apply(FunctionMap.empty()));

		// Variables can be pretty fancy
		Expression<Integer> fancyVariableExpression = new Assignment<>("a",
				new Plus<>(new Times<>(new Constant<>(5), new Constant<>(3)), new Constant<>(11)),
				new Plus<>(new Variable<>("a"), new Constant<>(7)));
		System.out.println(
				fancyVariableExpression.visit(new FancyExtendedToString<>()).apply(new StringBuilder()).toString());
		System.out.println(fancyVariableExpression.visit(new FancyIntegerAlgebra()).apply(FunctionMap.empty()));

Both evaluation and toString work relatively predictably. The only new thing here is that we now must evaluate the continuations we get from using our fancy integer algebra. We do that by passing them an empty environment using our functional map.

Our code will even work if we do very complex things like using multiple variables, having variables dependent of each other, and reusing variable names. Here’s one such complex example:

		// We can have more variables and reuse variables in scopes
		Expression<Integer> crazyVariableExpression = new Assignment<>("b",
				new Assignment<>("a",
						new Plus<>(
								new Times<>(new Constant<>(5),
										new Assignment<>("a", new Plus<>(new Constant<>(3), new Constant<>(17)),
												new Plus<>(new Variable<>("a"), new Constant<>(19)))),
								new Constant<>(11)),
						new Plus<>(new Variable<>("a"), new Constant<>(7))),
				new Plus<>(new Variable<>("b"), new Constant<>(13)));
		System.out.println(
				crazyVariableExpression.visit(new FancyExtendedToString<>()).apply(new StringBuilder()).toString());
		System.out.println(crazyVariableExpression.visit(new FancyIntegerAlgebra()).apply(FunctionMap.empty()));

Exercise: Try evaluating the expression by hand, including what the environment looks like at each point.

Examples such as the above is why we use a functional map. We need to make a copy of the environment each time we make a new scope (in this simple example that’s the same as each time we define a new variable). We need to do that so changing the value in one scope does not have impact on all others, whether the two scopes are nested, side-by side or any other relationship.

As a teaser for the next part, try considering the logging of the following tests:

		System.out.println("Starting simple evaluation");
		Integer simpleEvaluation = integerExpression.visit(new IntegerAlgebra());
		System.out.println("Simple evaluation done");
		System.out.println("Simple result: " + simpleEvaluation);

		System.out.println("Starting fancy evaluation");
		Function<Function<String, Integer>, Integer> fancyEvaluation = integerExpression
				.visit(new FancyIntegerAlgebra());
		System.out.println("Fancy evaluation done");
		System.out.println("Fancy result: " + fancyEvaluation.apply(FunctionMap.empty()));
		System.out.println("Fancy result again: " + fancyEvaluation.apply(FunctionMap.empty()));

Exercise: Predict or explain the difference in logging between the different examples.

Exercise: Does this have an impact of what we can do with the different implementations and the speed of execution of them?

Conclusion

In this part, we have investigated one new part and one part in further detail. We have seen how it is possible to create functional (immutable) data structures. These can have different performance profiles from “normal” data structures which may be useful in some cases, and they have huge advantages in simplicity of use, in particular in complex programs with multiple threads or where big collections are used by different parts of the application. We have also extended our running expression example with variables, a feat that is only possible to do in a simple way thaks to continuations. We have also seen that proper use of visitors allow extending data structures both with new classes and new functionality even though we may not be able to alter or even have access to the original classes.

In the next bit, we’ll take a look at how this is related to JIT compilation and see how it can provide us with a better insight into how asynchronous frameworks work.

Michael

Time person of the year 2006, Nobel Peace Prize winner 2012.

westergaard.eu/

Functional Data Structures

Continuations and More Advanced Expressions

Conclusion

Leave a Reply Cancel reply