Friday, February 11, 2011

Logistic Regression in F# using Microsoft Solver Foundation

Logistic regression is a workhorse in data mining. Like decision trees and SVMs, it is a very standard classifier. If you have a labeled data, logistic regression definitely is one of the classifiers that should tried.

In this post, I’d like to show how to implement a logistic regression using Microsoft Solver Foundation in F#. Because we use the optimization in Microsoft Solver Foundation, the core part of a logistic regression contains only 20 – 30 lines of code.

If you are only interested in F# and L-BFGS, but not logistic regression, please only read the second section: L-BFGS solver in Microsoft Solver Foundation.

Background for logistic regression

Logistic regression model predicts the probability that a data instance x being labeled as y

clip_image002[5]

A data instance is a vector with each dimension representing a value for some feature. For example, the Iris flower data set we are going to use in this post has 4 features: Sepal Length, Sepal Width, Petal Length and Petal Width. Each instance has a label y indicating the category. The above model only models binary labels. Logistic regression also has multiple categories version, see [Mitchell]. In this post, we focus on the binary version; however multiple versions should be straightforward once we know how to implement the binary one.

The model contains two parameters weight vector w and offset b, which should be estimated/learned on the training data setclip_image004[3]. In our derivations below, we will drop the offset b, because we can always add one feature (constant 1) to feature vector x’=[x 1] and hide b into the extended w’=[w b]. Sometimes, a model without the offset parameter could also perform equally well. Because equation (1) measures the probability, we could solve the model parameters by optimizing the joint probability estimated on dataset X.

If we assume that all the instances are independent and identically distributed (i.i.d.). Then the regularized joint probability is

clip_image006[3]

where clip_image008[1] is the regularization term for avoiding over-fitting on the training data.

The log version of this likelihood is

clip_image010[1]

We want to find the model parameter w that maximizes this likelihood. P(X) is a concave function with respect to w, so it has a global maximal.

What left is how to optimize this function. BFGS is a Quasi-Newton method that only needs first order gradient while a Newton family method needs second order explicitly. Microsoft Solver Foundation has a limited memory version of BFGS (L-BFGS). (L-BFGS is more popular than BFGS, sometimes people mention BFGS to mean L-BFGS.) There is a memory parameter m in L-BFGS, the space complexity of this algorithm is O(md), where d is the dimension of the variable space. Say, if you want to optimize a function that has 1,000,000 free variables, and set m to 10, then you need only a little more than 1M*10*8/1024^2=76M bytes.

The input for an L-BFGS optimizer is: 1) the function evaluator L(w) and 2) the gradient evaluator clip_image012:

clip_image014

The output is the best clip_image016that gives the maximum function value.

L-BFGS solver in Microsoft Solver Foundation 

Microsoft Solver Foundation already implements an L-BFGS solver. The reader is suggested to read pages 24-27 & 72-75 of MSF-SolverProgrammingPrimer.pdf (on my machine: C:\Program Files\Microsoft Solver Foundation\3.0.1.10599\Documents) for detailed reference. Because the code for optimizing Rosenbrock function is in C#, here I translated it to F#. Note how I translate between delegate type System.Func<> and F# functions.

Btw, I find that using the Microsoft.Solver.Foundatoin.dll in Sho would be more convenient as it packages everything into one DLL.

#r @"C:\Program Files (x86)\Sho 2.0 for .NET 4\packages\Optimizer\Microsoft.Solver.Foundation.dll"
openMicrosoft.SolverFoundation.Common
openMicrosoft.SolverFoundation.Solvers
openMicrosoft.SolverFoundation.Services


letsolverParams = newCompactQuasiNewtonSolverParams()
letsolver = newCompactQuasiNewtonSolver()
//add variables
let_, vidVaribaleX = solver.AddVariable(null)
let_, vidVaribaleY = solver.AddVariable(null)

//add a row and set it as the goal
let_, vidRow = solver.AddRow(null)
solver.AddGoal(vidRow, 0, true)

letOriginalRosenbrockFunction =
    letf (model:INonlinearModel)  (rowVid:int)  (values:ValuesByIndex)  (newValues:bool) =
        Math.Pow(1. - values.[1], 2.) + 100. * (Math.Pow(values.[2] - (values.[1] * values.[1]), 2.))  
    newSystem.Func<INonlinearModel, int, ValuesByIndex, bool, float> (f)


letOriginalRosenbrockGradient =
    letf (model:INonlinearModel) (rowVid:int) (values:ValuesByIndex) (newValues:bool) (gradient:ValuesByIndex) =
        gradient.[1] <- -2. * (1. - values.[1]) - 400. * values.[1] * (values.[2] - (values.[1] * values.[1]))
        gradient.[2] <- 200. * (values.[2] - (values.[1] * values.[1]))
    newSystem.Action<INonlinearModel, int, ValuesByIndex, bool, ValuesByIndex> (f)


solver.FunctionEvaluator <- OriginalRosenbrockFunction
solver.GradientEvaluator <- OriginalRosenbrockGradient
solver.Solve(solverParams);

Console.WriteLine(solver.ToString());

and the output:

> 
Minimize problem
Dimensions = 2
Variable indexes:
1, 2
Starting point = 0, 0
Solution quality is: LocalOptima
Number of iterations performed: 21
Number of evaluation calls: 27
Finishing point =0.999999997671584, 0.999999995285991
Finishing value =5.74844710320033E-18

val it : unit = ()
>


 



Implementation for logistic regression



We first get a sample data set online:



module Net = 
open System.Net

let fetchUrlSimple (url:string) =
let req = WebRequest.Create(url)
let response = req.GetResponse()
use stream = response.GetResponseStream()
use streamreader = new System.IO.StreamReader(stream)
streamreader.ReadToEnd()

type dataset =
{ features: float array array; // (instance = float array) array
mutable labels: int array; //
}


let iris =
let page = Net.fetchUrlSimple @"http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
let lines = page.Split([|'\n'|], StringSplitOptions.RemoveEmptyEntries)
let getIrisId (s:string) =
if s = "Iris-setosa" then
2
elif s = "Iris-versicolor" then
-1
else
1

let instances =
lines
|> Array.map (fun line ->
let
s = line.Split ','
let features = [| "1.0"; s.[0]; s.[1]; s.[2]; s.[3] |] |> Array.map float // add a dumy 1.0 into the fatures
let label = getIrisId s.[4]
features, label
)
|> Array.filter (fun (_, label) -> label <> 2)

let F, L = Array.unzip instances
{
features = F;
labels = L;
}


The logistic regression implementation:



module LogReg = 
let mutable lambda = 0.1

let dotProduct (x:float array) (g:ValuesByIndex) =
let mutable dot = 0.0
for i=0 to x.Length-1 do
dot <- dot + x.[i] * g.[i+1]
dot

let sigmoid (x:float array) (y:int) (g:ValuesByIndex) =
let mutable dot = 0.0
for i=0 to x.Length-1 do
dot <- dot + x.[i] * g.[i+1]
let z = (float y) * dot
(* code to deal with extream numbers, but with some issues with l(w) calculation
if z > 30.0 then
1.0
elif z < -30.0 then
0.0
else
*)
1.0 / (1.0 + exp (- z))

let sigmoid2 (x:float array) (y:int) (g:float array) =
let mutable dot = 0.0
for i=0 to x.Length-1 do
dot <- dot + x.[i] * g.[i]
1.0 / (1.0 + exp (- (float y) * dot))


let logregValue(ds:dataset) =
let dim = ds.features.[0].Length

let f (model:INonlinearModel) (rowVid:int) (values:ValuesByIndex) (newValues:bool) =
let mutable L = 0.0

for i=1 to dim do
L <- L + values.[i]*values.[i]
L <- - (L * lambda / 2.0)

for i=0 to ds.features.Length-1 do
//L <- L - log (1.0 + exp (- (float ds.labels.[i]) * dotProduct ds.features.[i] values ))
L <- L + log (sigmoid ds.features.[i] ds.labels.[i] values)

// printfn "L = %.10f" L
L

new System.Func<INonlinearModel, int, ValuesByIndex, bool, float> (f)

let logregGradient(ds:dataset) =
// printfn "gradient calculated"
let dim = ds.features.[0].Length
let f (model:INonlinearModel) (rowVid:int) (values:ValuesByIndex) (newValues:bool) (gradient:ValuesByIndex) =
for j=1 to dim do
gradient.[j] <- -lambda * values.[j]

for i=0 to ds.features.Length-1 do
let
coef = (1.0 - sigmoid ds.features.[i] ds.labels.[i] values) * (float ds.labels.[i])
for j=1 to dim do
gradient.[j] <- gradient.[j] + coef * ds.features.[i].[j-1]

new System.Action<INonlinearModel, int, ValuesByIndex, bool, ValuesByIndex> (f)



let makeSolver(ds:dataset) =

// set the solver parameters
let solverParams = new CompactQuasiNewtonSolverParams()
//solverParams.IterationLimit <- 11
//solverParams.Tolerance <- 1e-4
// the memory parameter m, default 17
solverParams.IterationsToRemember <- 10

// solver
let solver = new CompactQuasiNewtonSolver()

let dim = ds.features.[0].Length
// add variables
for i=1 to dim do
solver.AddVariable(null) |> ignore

// add a row and set it as the goal
let _, vidRow = solver.AddRow(null)
solver.AddGoal(vidRow, 0, false) |> ignore

// set funcation evaluator and gradient evaluator
solver.FunctionEvaluator <- logregValue(ds)
solver.GradientEvaluator <- logregGradient(ds)

// solve!
solver.Solve(solverParams) |> ignore

// get the optimized point (w) in the solver
let w = Array.create dim 0.0
for i=1 to dim do
w.[i-1] <- solver.GetValue(i).ToDouble()
w


Try our logistic regression model on the Iris dataset:



 



let sol = LogReg.makeSolver(iris)
let nWrongs = ref 0
for i=0 to iris.features.Length-1 do
let
prob = LogReg.sigmoid2 iris.features.[i] iris.labels.[i] sol
printfn "L = %A sig value = %A label = %A" iris.labels.[i] prob (if prob > 0.5 then "correct" else incr nWrongs; "wrong")


 



Reference



Tom Mitchell, Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression, new book chapter. This chapter gives a good introduction to logistic regression. For a more theoretic comparison, see: Ng & Jordan: on discriminative vs generative classifiers a comparison of logistic regression and naïve bayes.



Thomas P. Minka, A comparison of numerical optimizers for logistic regression.



  It is well written, the first page ready gives a concise introduction to logistic regression. The rest of the paper has a lot of discussion on optimization methods. Tom also has a Matlab package.



Galen Andrew and Jianfeng Gao. Scalable training of L1-regularized log-linear models. In Proceedings of the 24th International Conference on Machine Learning (ICML 2007), pp. 33-40, 2007.



L1-regularized logistic regression with an implementation in C++.



libLBFGS(LBFGS in C): http://www.chokkan.org/software/liblbfgs/



Note



I quite like this post because it combines so many good things together: F#, logistic regression, optimization and playing with sample datasets.

8 comments:

  1. Interesting post, thank you. I have meant to try out Solver Foundation for a while, and I think that moment is coming closer now.

    ReplyDelete
  2. Thanks.
    What would you recommend to show a graph?

    ReplyDelete
  3. @ Art,

    Microsoft Solver Foundation does not have any plot functionality. You can go to Sho library for the plot.

    ReplyDelete
  4. Very nice post, I wish use F# for data minning as soon.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. @Chris,

    Great to see the OCaml version!!

    ReplyDelete
  7. Yes It's a nice post this post is very use full for me

    Logistic Service Provider

    ReplyDelete