F# and Data Mining: Reflection on Martin Odersky’s Scala Course at Coursera

Some F# news first:

Our team F# and Composite Networks, got the 4th place at the Stackoverflow Closed-Question prediction competition. And check out our beautiful visualization, “mapping the world’s programmers”, in which the location parser is written in F# and is open sourced. I will write more about this Stackoverflow project in future posts, but today is about Scala…

--
Today I finished the last assignment of the Scala course at Coursera (~~code repository link removed~~ due to course restriction). I must say this course is so well designed! As I am already familiar with F#, walking through the course is easy. There are a few things that are new to me or that I only had vague understanding previously. So after this course, I really feel that I have learned something and this is also why I’d like to write this reflection here, to share my fresh feeling of it.

Before this course, my only experience with Scala is an email judge program, which frequently scans an Email account and receive the new submissions from students and auto-score them. I used Scala because Java platform has a quite nice email library and has the example code for Gmail and also because I like to play new languages. The realtime scoreboard for that course is still online – it was really cool to watch this leaderboard when students made submissions. The source code is also in the repository. This program is written in one day (with copy&pate from a few snippets online). Scala is friendly to new users – as in day one I already become productive in it.

In the following, I will summarize the examples used in the course, unique language feature in Scala, homework and a short language comparison to F# and C#.

1. Examples

The lectures, though mainly on simple materials, are well designed and well explained. Some of the classical examples are presented in the Scala language:

A. Newton’s method for calculating square root. Used as an example in higher order functions, stream/lazy evaluation. Later used to illustrate the concept of fix-point, but Odersky didn't go further to the fix-point in lambda calculus.

B. Rational number class. Numbers are the fundamental types in any programming languages. However usually they are built-in, and their implementations are hidden in the compilers or standard libraries. Implementing a Rational number is a great exercise in learning any language. From the number type, it can go as deep as class/object, comparison, type classes, passing operation dictionaries, etc.

C. Implementing functions as objects. The general idea is simple. What’s not simple is the variant.

abstract class Function1[-S,+T] { 
  def apply(x: S): T 
}

There are rules and design patterns for when to use covariant/contravariant/non-variant. In Scala, the type checking for variant is much stronger than Java. This also adds complexity to the Scala type system and makes learning more involved. A complicated type system gives more power to library designers though. I will talk about this later.

D. Use List and binary search tree to implement IntSet. Scala does not have datatypes, so the implementation follows Scala's OO style.

E. Expression problem. This is a classical problem to discuss functional design and OO design. The best discussion I found was Ralf Lammel’s C9 lectures.

F. Fibonacci sequence and prime number sequence.

G. Classical search problem: n-queen for depth-first search (DFS) and pouring water for breath-first search (BFS).

Basically every example is implemented in several Scala ways, to show different language features of Scala, from simple to more advanced ones. For example, Fibonacci sequence first appeared in simple recursive functions and later in Streams as an example to show lazy evaluation/computing on demand.

The implementation for the examples is quite OO and trait is heavily used in all examples – actually you cannot avoid OO in Scala for everything concrete is an object.

2. Concepts

Besides the examples, I learn more about the following concepts:

A. Dynamic dispatching vs higher order functions. This is an exercise left at the end of one lecture. Odersky didn’t discuss it, but it is quite interesting to think about it. What I can think of is that .toString/.hashCode/.compare these universal methods associated with any object can be implemented as higher order functions.

Say in OO design, we have a Rational class that implements Order interface

class Rational implements Order { override int compare(other) { }}

In functional design, we would have (immutable) data and functions defined for them:

module Rational =
  type t = int * int
  let compare (a1, b1) (a2, b2) = a1*b2 – a2 * b2

When implementing a sort function, OO approach depends on dynamic method dispatch, while functional design uses higher order functions:

let sort (compare: Rational.t -> Rational.t -> int) (Rational.t list) = ...

The problem also reminds me of dictionary passing style for emulating type classes.

B. Subtyping vs generics, variant. (Lecture 4.4 & 4.5) I was confused about covariant and contravarint in C# and Java (there are major differences between the two). The variant problem becomes a problem when OO-style inheritance marries with FP-style generics. And definitely Scala has a very good solution in its language design. After the lecture and some more reading, I am clearer. But I think what I really need is to read through Scala’s standard library design to get more real-world examples and design lessons.

Variant + implicits are very powerful abstraction tools though the latter seems ad hoc.

C. By value, by name, and lazy. The lecture on Stream is amazingly clear, and it clearly differentiates three concepts: by value, by name and lazy.

D. for comprehensions are translated as map/filter/flatMap higher order functions. For compressions are easier to read than higher order functions. For example, Python, the language that keeps readability in its heart, encourages for comprehension over map/lambda syntax. This is particularly true when nested comprehensions are used, e.g. in the 8-queen example used in the lecture:

def queens(n: Int) = {
 def placeQueens(k: Int): Set[List[Int]] = {
   if (k == 0) Set(List()) 
   else
   for {
     queens <- placeQueens(k - 1)
     col <- 0 until n
     if isSafe(col, queens)
   } yield col :: queens
 }
 placeQueens(n)
}

3. Homework

The homework is well designed, which, as Ordersky reveals in an email to the email list, is from several TAs’ hard work. There are six projects; the skeleton of each project is already there, and each project has several holes to fill in. The primal exercise of doing these projects is thinking in types. Every function is fixed with a type signature, and the implementation I am asked to fill is usually one-liners. Placing the one-line-of-code that fits the type signature well is the primal problem in my head. After seeing six type-oriented design, I think when next time I need to design a project myself in Scala, it will come more naturally OO. (If you see my Scala email judge, you will immediately notice that I used Scala in an ML style…)

The other aspect of the homework is interesting. E.g. Representing a set as a function seems quite neat (though impractical because of efficiency), and using breadth-first-search for Bloxorz game is quite fun.

The hoemwork project also teaches test-driven development, for each project has a test suite, which is usually incomplete and requires the user to fill in more test cases.

4. Scala vs F#/C#

Qua work of art, the work of art cannot be interpreted; there is nothing to interpret; we can only criticize it according to standards, in comparison to other works of art. – T. S. Eliot, Hamlet and his problems, 1921.

This section is not about the course. But it is always interesting to compare languages, and in comparison, we learn.

My impression of Scala is that it is closer to C# than to any functional language. Scala is Java redesigned. The the core of Scala is OO. Then functional programming concepts and others are added based on its OO core. This is the same situation with C#. For language features, it seems to me that only Trait is unique in Scala, and Scala’s treat on variant is more sound. For other features, C# has them or can emulate them with little extra syntax. The fact that Scala is more complicate-designed than C# does not mean it is a better C#. I have seen discussions online arguing that Scala is too complicated and some early enthusiasts go back to Java after their projects become bigger and harder to manage.

On the functional side, Scala program can be very declarative, e.g. when using its immutable collections; however it is a another style of declarativeness which is different from ML family languages (F#/OCaml, and to some extend Haskell). In ML languages, the concept of function is everywhere – from simple function, to operators, to data constructors, functors, to even type constructors, it is a very consistent applicative style. While in Scala, from the very beginning, you think of an object, and then think about what can we do to this object. This major difference becomes more evident when you actually begin to design a non-trivial project.

If Java platform has a soundly implemented ML, I would prefer that to Scala. However there is no ML there. If cross platform, concise and functional, performance of a statics language, a lot of libraries, etc, are kept in mind, then Scala seems to be a good choice among the very few. And recently the EPFL + Stanford joint team on parallel computing DSLs using Scala seems quite interesting and useful. There are also light-weight distributed frameworks written in Scala, e.g. Spark at Berkeley.

Summary

It is always entertaining to learn a new programming language, and I learned quite a lot by following this course. Thanks to Prof. Martin Ordersky and his TAs. It would be great if there will be a new and more advanced course next year. I would be interested in:

1. Design and implementation of the Scala collection library. Very few people write collection libraries, but many programmers need to design APIs for others to use. The Scala collection library is a gold mine to learn modern API design patterns.

2. Concurrent and parallel programming in Scala, using Actor, and other DSLs.

I will still use F# primarily for my data mining projects. And when I need a Java library or serious cross platform, I can rely on Scala.

7 comments:

UnknownDecember 26, 2012 at 5:35 PM
a nice comparison of scala and ocaml: http://blog.enfranchisedmind.com/2009/05/scala-not-functional/
AnonymousMay 2, 2013 at 12:00 PM
The bitbucket repository is no longer available.
UnknownDecember 3, 2013 at 10:34 AM
Got much information on your post, will surely share it, these will help others gain knowledge on these side. Thanks for sharing. Cheers!
data mining/web mining
AnonymousMarch 25, 2014 at 11:05 PM
There is a ML for the JVM: http://ocamljava.x9c.fr/preview/
However it's experimental.
AnonymousJune 3, 2014 at 2:01 AM
None of this is clear. This sounds like someone got paid to post this article sucking up to the creator of SCALA.
NettechIndiaJanuary 11, 2019 at 7:13 PM
nice post. clear codding thank you..
python training in Mumbai

erenSeptember 22, 2023 at 9:06 AM
https://saglamproxy.com
metin2 proxy
proxy satın al
knight online proxy
mobil proxy satın al
WİİS

Tuesday, November 13, 2012

Reflection on Martin Odersky’s Scala Course at Coursera