I am always trying to find a scripting language which could replace Python for data preprocessing tasks. I love Python for everything except its performance [think about implementing a dynamic programming with two nested and busy loops]. There are many choices, F#/OCaml/Scala(succinct & typed!), go (at least typed…), etc. I also checked Clojure several years ago. But at that time, I knew little Lisp and I felt that Clojure is bundled with too much abstractions under a dynamic type system which would hurt performance. But recently I happen to see a Clojure code snippet which has type annotations (type hints in Clojure’s terminology)!
With type hints and other tricks, Clojure can be as fast as Java [1] (i.e., the same order of speed with F# and OCaml). Of course, the type hinted and highly imperative Clojure programs are even harder to read than their Java equivalents. But at least Clojure provides a way to achieve this goal in Clojure itself. Once we find the performance bottle neck, we don’t need any other tool, we just add type hints and if it does not work, simply write it in a more careful way in Clojure (e.g., keeping use unboxed primitive values and Java primitive arrays.) The good thing is that we don’t need to leave Clojure. Because I want to use Clojure as a quick scripting language, I don’t really want to create a project and setup a Makefile-kind of thing – I just want a single file that contains all the program. If we want to speed up Python, we can use Cython or writing C and use the compiled dynamic library in Python. This is just too much for the purpose of scripting. And the final product would be hard to deploy, think about somebody has no Cython installed, or transferring binaries to a new machine, etc. Just a nightmare. For Clojure, the compiler and the library are in the same jar (e.g. clojure-1.6.0.jar), the deployment of the script is really easy.
Set up Emacs with a Clojure Interpreter
No leiningen, no cider, no maven! Just download clojure.jar and two .el files. Then you are done! Actually you only need to have a Java compiler installed to have compile Clojure because Clojure compiler is written in Java. We really need to give a thump up for Rich Hickey for creating a production-quality language with so small size source code (less than 3MB for Clojure 1.6). That’s amazing!
Since I intend to use Clojure for data processing (or in a fancier phase, data science), an interpreter is a must. I need to program-and-test on the go [the same working routine I am under F#, R, and Matlab]. Unlike F# or Scala, the ideal IDE for Clojure is no IDE. Clojure programs needs no dot magic while for F# and Scala it is quite nice to type “.” and then find available functions under a module or object. So I have no plan to install the Cider. I use clojure-mode and inf-clojure, both of which are distributed as a single .el file. Or you can use (M-x list-packages, in Emacs 24 only) to install these two modules. I would suggest to install auto-complete mode and the dictionary for Clojure. The dictionary contains a long list of common Clojure keywords. But this is optional as once we get familiar with Clojure’s functions, we need no auto complete for keywords [Emacs’s M-/ is good enough]. For example, I never use auto complete minor mode when programming R scripts under ESS mode.
Put these into .emacs:
(setq inferior-lisp-program "C:\\Progra~1\\Java\\jdk1.7.0\\bin\\java -cp d:\\home\\clojure-1.7.0-alpha5\\clojure-1.7.0-alpha5.jar clojure.main") ; clojure REPL command line
(add-hook 'clojure-mode-hook #'inf-clojure-minor-mode)
(setq inf-clojure-buffer "*inferior-lisp*")
To start the interpreter, use M-x run-lisp. The shortcuts are the same with elisp modes (e.g. the *scratch* buffer). Two most useful ones:
C-x C-e: send the last expression to the interpreter. And expression is all the content between two parenthesis. So we can use this short cut to send multi-line function definition, or single-line printf to the interpreter.
C-x C-r: send the clipboard to the interpreter.
Run Clojure scripts from command line
REPL is great for playing with data and making sure every function works as expected. In the end, we usually put all functions into a script file which can run as a task. If we just use Java standard library and Clojure core library, we can simply type the following command:
java -cp clojure.jar clojure.main script.clj
and we can create .bat, an Linux alias or whatever to make the command short. Something like:
alias runclj=”java -cp /path/to/clojure.jar clojure.main”
However when our script is dependent on some libraries, managing the dependency can be tricky. I searched online, it seems that everyone is recommending leiningen to load a library or manage a script that contains multiple files. For some reason, I have to avoid using leiningen.
I find a simple solution. That is the good and old load-file command in LISP world (In Emacs, I sometimes use load-file to update my .emacs without restarting Emacs).
And it seems that the Clojure community has learnt from the practice of Javascript. That is, if a library is small enough, then distribute it as a single file! I checked several clojure libraries that I am interested: data.csv, clojure-csv, data.json. All these libraries are distributed as one or two .clj files!
The following script first loads two libraries: numeric-tower and json, and then do a test:
(load-file "d:\\home\\numeric_tower.clj")
(load-file "d:\\home\\json.clj")
(load-file "d:\\home\\json_compat_0_1.clj")
(ns example
(:require [clojure.math.numeric-tower :as math])
(:require [clojure.data.json :as json]))
; use numeric-tower library
(defn- sqr
"Uses the numeric tower expt to square a number"
[x]
(math/expt x 2))
(println (sqr 100))
; use json library
(println (json/read-str "{\"a\":1,\"b\":2}"))
Notice that the last line of json.clj is: (load "json_compat_0_1"). load is similar to load-file, but it loads the .class files. Since we don’t pre-compile json_compat_0_1, we need to comment out this line and load json_compat_0_1 explicitly in our script.
Since I am usually dealing with large amount of data, I don’t care about the compiling and JVM startup time, which is negligible compared to the total time spent on number crunching.
My first Clojure programs
I’d like to thank Prof. Dan Grossman’s brilliant course at Coursera first. I never believe in books like Seven languages in seven weeks. Repeatedly writing three-line programs in different languages only give you an illusion that you’ve learned. Prof. Grossman’s course is truly “three languages in one semester”. One of his three languages is Racket/Scheme [the other two being ML and Ruby]. I did the two Racket assignments (Assignment 4&5). Assignment 5 is on how to implement a simple interpreter in Racket. That’s about all my training in lisp.
I solved some Hacker Rank problems using Clojure. The following includes my solutions to three problems. Two of them are dynamic programming problems. I focus on dynamic programming because the loops and array operations are slow in Python and I wish to test their speed in Clojure.
Problem: two strings
Find if there is a substring that appears in both A and B.
https://www.hackerrank.com/challenges/two-strings
code:
(use '[clojure.set :only (intersection)])
(def n (Integer/parseInt (read-line)))
(defn toset [line]
(->
line
char-array
set))
(defn compare1 [line1 line2]
(if (= 0 (count (intersection (toset line1) (toset line2))))
(println "NO")
(println "YES")))
(loop [i 0]
(when (< i n)
(let [line1 (read-line)
line2 (read-line)]
(compare1 line1 line2)
(recur (inc i))
)))
Problem: Candies
A very common interview question which tests dynamic programming and its memorization technique.
https://www.hackerrank.com/challenges/candies
code:
(def T (Integer. (read-line)))
(def v
(->> (range T)
(map (fn [_] (read-line)))
(map #(Integer. %))
(into [])))
(def inf 2000000)
(def f (int-array T inf))
(defn dp
[i]
(if (< (aget f i) inf)
(aget f i)
(cond
(= i 0) (let [right (if (> (v i) (v (inc i))) (inc (dp (inc i))) 1)]
(aset f i right)
right)
(= i (dec T)) (let [left (if (> (v i) (v (dec i))) (inc (dp (dec i))) 1)]
(aset f i left)
left)
:else (let [right (if (<= (v i) (v (inc i))) 1 (inc (dp (inc i))))
left (if (<= (v i) (v (dec i))) 1 (inc (dp (dec i))))
dp-i (max right left)]
(aset f i dp-i)
dp-i
))))
(println (apply + (map dp (range T))))
Problem: Stock Maximize
Given a sequence of stock prices, find the maximum profit.
https://www.hackerrank.com/challenges/stockmax
(set! *warn-on-reflection* true)
(defn line->ints
[line]
(->>
(clojure.string/split line #" ")
(map #(Integer/parseInt %))
(into [])
))
(defn find-sell-points
"Find decreasing local-maximums"
[v]
(let [n (count v)
stack (int-array n)]
(aset stack 0 1) ; the first sell point at 1
(loop [i 2 j 0 in-while false add-to false]
(println j (take 4 stack))
(if (< i n)
(let [v-top (v (aget stack j))
v-i (v i)]
(if in-while
(if (>= v-i v-top) ; in-while
(if (= 0 j) (recur i j false true) (recur i (dec j) true false))
(recur i (inc j) false true))
(if add-to
(do (aset stack j i) (recur (inc i) j false false))
(if (>= v-i v-top)
(recur i j true false)
(do (aset stack (inc j) i) (recur (inc i) (inc j) false false))))))
[stack j]))))
(defn simulate-trade
"Sell at pos"
[v pos]
(loop [i 0
j 0
sum 0]
(if (< i (count pos))
(if (<= j (pos i))
(recur i (inc j) (+ sum (max 0 (- (v (pos i)) (v j)))))
(recur (inc i) (inc (pos i)) sum))
sum)))
(let [T (Integer/parseInt (read-line))]
(loop [test 0]
(when (< test T)
(let [_ (read-line)
x (read-line)
v (line->ints x)
[pos j] (find-sell-points v)
res (simulate-trade v (into [] (take (inc j) pos)))]
(println res))
(recur (inc test)))))
I may write a longer post on how to learn Clojure basics by solving Hacker Rank problems. My inspiration is from the following page:
http://erl.nfshost.com/static/euler.uberdoc.html
which shows how to use Clojure to solve the first 33 Project Euler problems. The page layout is so nice: on the left is the explanation, and the code is well aligned on the right. For each problem there is also a Docs section, which lists the new Clojure core functions that are introduced in the solution. By following all the 33 solutions, we may have seen usage examples of many functions listed on the Clojure cheat sheet.
Good readings on Clojure
I borrowed two Clojure books from a university library. But most of my readings are based on online materials. I list them below:
http://erl.nfshost.com/static/euler.uberdoc.html
Adam Bard’s Clojure blog
The Clojure Style Guide
Clojure Cheatsheet
Introducing HipHip (Array): Fast and flexible numerical computation in Clojure
Notes:
[1] I have seen a few questions asked on Stackoverflow and other places that concern the performance of Clojure. Here is a list of them:
http://stackoverflow.com/questions/29474457/performance-of-vector-and-array-in-clojure
http://stackoverflow.com/questions/14115980/clojure-performance-really-bad-on-simple-loop-versus-java
http://stackoverflow.com/questions/14949705/clojure-performance-for-expensive-algorithms