Road to Haskeller #20

Laziness

We have been blindly accepting that Haskell uses lazy evaluation, but how does that work exactly? It turns out that what's underneath Haskell's lazy evaluation is something called thunks. A thunk is a calculation that has not been evaluated yet and is stored in memory until it is needed. To understand thunks, let's look at an example.

f a b = if a `mod` 2 == 0 then a else b
 
res = f (1+1) (2+1) -- (1+1) and (2+1) are thunks yet to be evaluated

Let's look at the above example of f. We see that a has to be evaluated to determine which condition is met. Thus, a is evaluated.

f 2 (2+1) = if 2 `mod` 2 == 0 then 2 else (2+1)

Then, we see that the first condition becomes true, and we return 2 without computing (2+1). This is great because we could avoid unnecessary computation.

Disadvantage of Laziness

Unfortunately, there is no silver bullet in this world, and laziness is not always better. Let's look at an example with foldl to see what I mean.

foldl (+) 0 [1,2,3]

Let's lazily compute the above.

foldl (+) 0 [1,2,3]
-- = foldl (+) (0+1) [2,3]
-- = foldl (+) ((0+1)+2) [3]
-- = foldl (+) (((0+1)+2)+3) []
-- = ((0+1)+2)+3
-- = (1+2)+3
-- = 3+3
-- = 6

Since we do not add values immediately, we pile up a big thunk (((0+1)+2)+3) and evaluate it afterward, resulting in more steps and more memory usage. The size of the array was small, so it should not cause any practical issues, but it will start becoming problematic as we increase the size of the array. This is where strictness is useful. The foldl' function enforces strict evaluation, leading to fewer steps and less memory usage.

foldl' (+) 0 [1,2,3]
-- = foldl (+) (1) [2,3]
-- = foldl (+) (3) [3]
-- = foldl (+) (6) []
-- = 6

Strictness

Behind the foldl' and other strict evaluations in Haskell, there is a function seq defined as follows:

seq :: a -> b -> b
seq a b = b

At first glance, it looks like a function that just discards a. However, this function is defined at the compiler level such that a has to be evaluated before returning b. Using seq, you can define foldl' like this:

foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' _ z [] = z
foldl' f z (x:xs) =
  let z' = f z x in
  z' `seq` foldl' f z' xs

By using seq, z' is strictly evaluated before we move on to the next recursive call. There are some other useful functions predefined in Haskell using seq.

($) :: (a -> b) -> a -> b -- Lazy
($!) :: (a -> b) -> a -> b -- Strict!
 
f $ x = f x
f $! x = x `seq` f x

The above $! is useful in many areas but is especially useful in an IO action.

action = do
  --- actions
  return $! f x

By using $!, you can prevent returning a potentially huge thunk.

Think carefully

Always remember that there is no silver bullet in this world. Although strictness can help in some situations like foldl', it can also negatively affect the performance of algorithms. In fact, using strict evaluation can mess up the compiler, so it is advised to avoid using strictness unless it is unavoidable. If you are using a compiler instead of an interpreter, most of the strictness problems can be solved automatically by the compiler, including the problem with foldl, so you don't have to worry about it too much.

Resources

Philipp, Hagenlocher. 2020. Haskell for Imperative Programmers #26 - Strictness, Thunks & seq. YouTube.