Friday, June 4, 2010

A note on efficiency in F#, part I: small things first

In this blog, I often talk about efficiency aside data mining as efficiency really matters in this field. Previously, we’ve talked about how to use PInvoke and C++/CLI to use C++ code to boost the performance. In this post, I’d like to start a series on F# itself. As a functional programming language, F# programs are kind of high-level, thus harder to reason its performance than imperative programs. Not the big-O thing, it is the constant. This series contains little tips about performance.

Note that these tips are just facts to let you know. There is no performance guidance that one should follow. There is always a treadoff, high level functional constructs usually are slower than their pure imperative equivalents. However, they are safer and have better abstraction. It usually depends when you really need to do some optimization. E.g. if you could guarantee that your program runs less than 0.5 second, then optimizing it would usually become less meaningful, unless you have another program calling this program a million times.

Also remember that

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" (by Donald Knuth)

1. Stream IO and Seq module

Think about that we need to process a large text file, say, 1TB. Then it is infeasible to load the file into memory. The approach is to treat the text file as a stream and scan this stream to get useful information.

  1. open System.IO
  2. let readLinesSeq = File.ReadLines

.Net 4.0 provides ReadLines, the type of its return value is seq<string>.

In lower versions of .Net, one needs to implement this function:

  1. let readLines filePath = seq {
  2. use sr = new StreamReader (filePath)
  3. while not sr.EndOfStream do
  4. yield sr.ReadLine ()
  5. }

Once we have this function, we can process it as as a sequence.

While prototyping a model, we may make mistakes, and we also know that scanning the whole file would cost a lot of time. If we find an error after scanning the whole scanning, that would waste a lot of time. So we would like to take the first few lines of the file as a small test set for the debugging purpose. By treating a file as a sequence, we could simply use

  1. let lines100 = "filename" |> readLines |> Seq.take 100

to get a small test set.

Here is an quite old post in 2006. At that time, there seemed to be no Seq module. The idea is the same.

2. Loop, Seq module and the tail-recursive version of “while”

One of my first F# programs is a prime number test function. I first used an ugly for loop with mutable variables:

  1. let isPrime1 n =
  2. let m = int(sqrt (float n))
  3. let mutable flag = true
  4. for i=2 to m do
  5. if n % i = 0 then
  6. flag <- false
  7. flag

You can see that I clearly didn’t know how to early stop/break the loop when flag is assigned to false. Because I could not find break, so I crazily use exception to break out the loop:-)

  1. exception Break
  2. let isPrime1' n =
  3. let m = int(sqrt (float n))
  4. let mutable flag = true
  5. try
  6. for i=2 to m do
  7. if n % i = 0 then
  8. flag <- false
  9. raise Break
  10. with
  11. _ -> ()
  12. flag

And based on my FP experience on Scheme, I immediately wrote a tail-recursive function, which behaves like a “while” loop:

  1. let isPrime2 n =
  2. let m = int(sqrt (float n))
  3. let rec loop div =
  4. if div > m then true
  5. elif n % div = 0 then false
  6. else loop (div+1)
  7. loop 2

I can also explicitly use a while loop with a mutable flag:

  1. let isPrime3 n =
  2. let m = int(sqrt (float n))
  3. let mutable flag = true
  4. let mutable div = 2
  5. while flag && div <= m do
  6. if n % div = 0 then
  7. flag <- false
  8. div <- div + 1
  9. flag

And I finally find the elegant way using Seq:

  1. let isPrime n = { (float(n)))} |> Seq.forall (fun i->n%i<>0)

However, the last version would be slower than others except for the first one. Because a sequence is actually an object implements IEnumerable interface. If the object currently points to 10, to get 11, it needs to call a Next function to get it. There is definitely some overhead here.

So pay attention when you use Seq to do the job of a for loop or while loop when the loop is executed for a huge number of times.

3. List and Array

F# lists are single linked lists. Thus, Lists use more memory as it needs to keep a next pointer in every node. Lists are also slow. Let’s do a test:

  1. let n = 10000000
  2. let a = List.init n (fun i-> i)
  3. let b = Array.init n (fun i-> i)

creating list a uses about 2.2 seconds and about 200MB memory, while creating array b only uses 0.12 second and about 40MB memory, which equals our estimate n * 4 bytes/int ~= 40MB.

4. Access lists, arrays and sequences

We know that the three modules List, Array and Seq have similar functions. E.g. all of them have the function map, which applies a function to the existing list, array or sequence to generate a new one. However, the three maps are implemented differently, with array most efficiently, list second, the most general seq the last:

  1. // form F# source code: local.fs
  2. // note: the actuall implementation for list is in local.fs, not in list.fs
  3. let map f x =
  4. match x with
  5. | [] -> []
  6. | [h] -> [f h]
  7. | (h::t) ->
  8. let cons = freshConsNoTail (f h)
  9. mapToFreshConsTail cons f t
  10. cons
  11. // form F# source code: array.fs
  12. let map (f: 'T -> 'U) (array : 'T[]) : 'U[]=
  13. checkNonNull "array" array
  14. let inputLength = array.Length
  15. let result = Microsoft.FSharp.Primitives.Basics.Array.zeroCreateUnchecked inputLength
  16. Parallel.For(0, inputLength, fun i ->
  17. result.[i] <- f array.[i]) |> ignore
  18. result
  19. // form F# source code: seq.fs
  20. let map f (e : IEnumerator<_>) : IEnumerator<_>=
  21. upcast
  22. { new MapEnumerator<_>() with
  23. member this.DoMoveNext (curr : byref<_>) =
  24. if e.MoveNext() then
  25. curr <- (f e.Current)
  26. true
  27. else
  28. false
  29. member this.Dispose() = e.Dispose()
  30. }

We can see that the is actually implemented using ParallelFor. The most general implementation is based one the IEnumerable interface in .Net, which is relatively inefficient compared to a for loop.

However, by using or, we create another array or list. While only creates an auxiliary sequence, which costs nearly no memory!

So usually, I use to map arrays or lists to keep memory usage small. When performance really matters, I use array and for loop, i.e. I write imperative code. In my experience, writing imperative code in F# sometimes still introduces some runtime errors, while this is rare in pure functional code.

When I only need to iterate an array, I will directly use Array.iter since it is efficient, and use no extra memory.

All in all: performance costs. Do some quick estimation before you write imperative code. Also know the low-level implementation of your daily-use libraries is critical for you to estimate the performance.

5. Access F# matrices and vectors

The Vector<’T> type is implemented using 1-D array and the Matrix<’T> type is implemented using 2-D array for dense case and row-sparse for sparse case. Details are in a matrix series post.

Let m be a matrix, we could also use m.[i,j] to access the element, which is actually a function call to the the internal array. Thus there is some overhead. Some profiling is reported in a previous post.

The way to get rid of this kind of overhead is to program on the internal arrays directly. As in this answer in Stackoverflow.

1 comment: