Logistic regression is a workhorse in data mining. Like decision trees and SVMs, it is a very standard classifier. If you have a labeled data, logistic regression definitely is one of the classifiers that should tried.

In this post, I’d like to show how to implement a logistic regression using Microsoft Solver Foundation in F#. Because we use the optimization in Microsoft Solver Foundation, the core part of a logistic regression contains only 20 – 30 lines of code.

If you are only interested in F# and L-BFGS, but not logistic regression, please only read the second section: *L-BFGS solver in Microsoft Solver Foundation*.

## Background for logistic regression

Logistic regression model predicts the probability that a data instance ** x** being labeled as

*y*

A data instance is a vector with each dimension representing a value for some feature. For example, the Iris flower data set we are going to use in this post has 4 features: Sepal Length, Sepal Width, Petal Length and Petal Width. Each instance has a label *y* indicating the category. The above model only models binary labels. Logistic regression also has multiple categories version, see [Mitchell]. In this post, we focus on the binary version; however multiple versions should be straightforward once we know how to implement the binary one.

The model contains two parameters weight vector ** w** and offset

*b*, which should be estimated/learned on the training data set. In our derivations below, we will drop the offset

*b*, because we can always add one feature (constant 1) to feature vector

*x’=[x**1*and hide

**]***b*into the extended

*w’=[w**b*. Sometimes, a model without the offset parameter could also perform equally well. Because equation (1) measures the probability, we could solve the model parameters by optimizing the joint probability estimated on dataset

**]***X*.

If we assume that all the instances are independent and identically distributed (i.i.d.). Then the regularized joint probability is

where ** **is the regularization term for avoiding over-fitting on the training data.

The log version of this likelihood is

We want to find the model parameter ** w** that maximizes this likelihood. P(X) is a concave function with respect to w, so it has a global maximal.

What left is how to optimize this function. BFGS is a Quasi-Newton method that only needs first order gradient while a Newton family method needs second order explicitly. Microsoft Solver Foundation has a limited memory version of BFGS (L-BFGS). (L-BFGS is more popular than BFGS, sometimes people mention BFGS to mean L-BFGS.) There is a memory parameter *m* in L-BFGS, the space complexity of this algorithm is *O(md),* where *d* is the dimension of the variable space. Say, if you want to optimize a function that has 1,000,000 free variables, and set *m* to 10, then you need only a little more than 1M*10*8/1024^2=76M bytes.

The input for an L-BFGS optimizer is: 1) the function evaluator *L( w)* and 2) the gradient evaluator :

The output is the best that gives the maximum function value**. **

## L-BFGS solver in Microsoft Solver Foundation

Microsoft Solver Foundation already implements an L-BFGS solver. The reader is suggested to read pages 24-27 & 72-75 of **MSF-SolverProgrammingPrimer.pdf (**on my machine: C:\Program Files\Microsoft Solver Foundation\3.0.1.10599\Documents**) **for detailed reference. Because the code for optimizing Rosenbrock function is in C#, here I translated it to F#. Note how I translate between delegate type **System.Func<>** and F# functions.

Btw, I find that using the Microsoft.Solver.Foundatoin.dll in Sho would be more convenient as it packages everything into one DLL.

#r @"C:\Program Files (x86)\Sho 2.0 for .NET 4\packages\Optimizer\Microsoft.Solver.Foundation.dll"

openMicrosoft.SolverFoundation.Common

openMicrosoft.SolverFoundation.Solvers

openMicrosoft.SolverFoundation.Services

letsolverParams = newCompactQuasiNewtonSolverParams()

letsolver = newCompactQuasiNewtonSolver()

//add variables

let_, vidVaribaleX = solver.AddVariable(null)

let_, vidVaribaleY = solver.AddVariable(null)

//add a row and set it as the goal

let_, vidRow = solver.AddRow(null)

solver.AddGoal(vidRow, 0, true)

letOriginalRosenbrockFunction =

letf (model:INonlinearModel) (rowVid:int) (values:ValuesByIndex) (newValues:bool) =

Math.Pow(1. - values.[1], 2.) + 100. * (Math.Pow(values.[2] - (values.[1] * values.[1]), 2.))

newSystem.Func<INonlinearModel, int, ValuesByIndex, bool, float> (f)

letOriginalRosenbrockGradient =

letf (model:INonlinearModel) (rowVid:int) (values:ValuesByIndex) (newValues:bool) (gradient:ValuesByIndex) =

gradient.[1] <- -2. * (1. - values.[1]) - 400. * values.[1] * (values.[2] - (values.[1] * values.[1]))

gradient.[2] <- 200. * (values.[2] - (values.[1] * values.[1]))

newSystem.Action<INonlinearModel, int, ValuesByIndex, bool, ValuesByIndex> (f)

solver.FunctionEvaluator <- OriginalRosenbrockFunction

solver.GradientEvaluator <- OriginalRosenbrockGradient

solver.Solve(solverParams);

Console.WriteLine(solver.ToString());

and the output:

>

Minimize problem

Dimensions = 2

Variable indexes:

1, 2

Starting point = 0, 0

Solution quality is: LocalOptima

Number of iterations performed: 21

Number of evaluation calls: 27

Finishing point =0.999999997671584, 0.999999995285991

Finishing value =5.74844710320033E-18

val it : unit = ()

>

## Implementation for logistic regression

We first get a sample data set online:

module Net =

open System.Net

let fetchUrlSimple (url:string) =

let req = WebRequest.Create(url)

let response = req.GetResponse()

use stream = response.GetResponseStream()

use streamreader = new System.IO.StreamReader(stream)

streamreader.ReadToEnd()

type dataset =

{ features: float array array; // (instance = float array) array

mutable labels: int array; //

}

let iris =

let page = Net.fetchUrlSimple @"http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

let lines = page.Split([|'\n'|], StringSplitOptions.RemoveEmptyEntries)

let getIrisId (s:string) =

if s = "Iris-setosa" then

2

elif s = "Iris-versicolor" then

-1

else

1

let instances =

lines

|> Array.map (fun line ->

let s = line.Split ','

let features = [| "1.0"; s.[0]; s.[1]; s.[2]; s.[3] |] |> Array.map float // add a dumy 1.0 into the fatures

let label = getIrisId s.[4]

features, label

)

|> Array.filter (fun (_, label) -> label <> 2)

let F, L = Array.unzip instances

{

features = F;

labels = L;

}

The logistic regression implementation:

module LogReg =

let mutable lambda = 0.1

let dotProduct (x:float array) (g:ValuesByIndex) =

let mutable dot = 0.0

for i=0 to x.Length-1 do

dot <- dot + x.[i] * g.[i+1]

dot

let sigmoid (x:float array) (y:int) (g:ValuesByIndex) =

let mutable dot = 0.0

for i=0 to x.Length-1 do

dot <- dot + x.[i] * g.[i+1]

let z = (float y) * dot

(* code to deal with extream numbers, but with some issues with l(w) calculation

if z > 30.0 then

1.0

elif z < -30.0 then

0.0

else

*)

1.0 / (1.0 + exp (- z))

let sigmoid2 (x:float array) (y:int) (g:float array) =

let mutable dot = 0.0

for i=0 to x.Length-1 do

dot <- dot + x.[i] * g.[i]

1.0 / (1.0 + exp (- (float y) * dot))

let logregValue(ds:dataset) =

let dim = ds.features.[0].Length

let f (model:INonlinearModel) (rowVid:int) (values:ValuesByIndex) (newValues:bool) =

let mutable L = 0.0

for i=1 to dim do

L <- L + values.[i]*values.[i]

L <- - (L * lambda / 2.0)

for i=0 to ds.features.Length-1 do

//L <- L - log (1.0 + exp (- (float ds.labels.[i]) * dotProduct ds.features.[i] values ))

L <- L + log (sigmoid ds.features.[i] ds.labels.[i] values)

// printfn "L = %.10f" L

L

new System.Func<INonlinearModel, int, ValuesByIndex, bool, float> (f)

let logregGradient(ds:dataset) =

// printfn "gradient calculated"

let dim = ds.features.[0].Length

let f (model:INonlinearModel) (rowVid:int) (values:ValuesByIndex) (newValues:bool) (gradient:ValuesByIndex) =

for j=1 to dim do

gradient.[j] <- -lambda * values.[j]

for i=0 to ds.features.Length-1 do

let coef = (1.0 - sigmoid ds.features.[i] ds.labels.[i] values) * (float ds.labels.[i])

for j=1 to dim do

gradient.[j] <- gradient.[j] + coef * ds.features.[i].[j-1]

new System.Action<INonlinearModel, int, ValuesByIndex, bool, ValuesByIndex> (f)

let makeSolver(ds:dataset) =

// set the solver parameters

let solverParams = new CompactQuasiNewtonSolverParams()

//solverParams.IterationLimit <- 11

//solverParams.Tolerance <- 1e-4

// the memory parameter m, default 17

solverParams.IterationsToRemember <- 10

// solver

let solver = new CompactQuasiNewtonSolver()

let dim = ds.features.[0].Length

// add variables

for i=1 to dim do

solver.AddVariable(null) |> ignore

// add a row and set it as the goal

let _, vidRow = solver.AddRow(null)

solver.AddGoal(vidRow, 0, false) |> ignore

// set funcation evaluator and gradient evaluator

solver.FunctionEvaluator <- logregValue(ds)

solver.GradientEvaluator <- logregGradient(ds)

// solve!

solver.Solve(solverParams) |> ignore

// get the optimized point (w) in the solver

let w = Array.create dim 0.0

for i=1 to dim do

w.[i-1] <- solver.GetValue(i).ToDouble()

w

Try our logistic regression model on the Iris dataset:

let sol = LogReg.makeSolver(iris)

let nWrongs = ref 0

for i=0 to iris.features.Length-1 do

let prob = LogReg.sigmoid2 iris.features.[i] iris.labels.[i] sol

printfn "L = %A sig value = %A label = %A" iris.labels.[i] prob (if prob > 0.5 then "correct" else incr nWrongs; "wrong")

## Reference

Tom Mitchell, Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression, new book chapter. This chapter gives a good introduction to logistic regression. For a more theoretic comparison, see: Ng & Jordan: on discriminative vs generative classifiers a comparison of logistic regression and naïve bayes.

Thomas P. Minka, A comparison of numerical optimizers for logistic regression.

It is well written, the first page ready gives a concise introduction to logistic regression. The rest of the paper has a lot of discussion on optimization methods. Tom also has a Matlab package.

Galen Andrew and Jianfeng Gao. Scalable training of L1-regularized log-linear models. In *Proceedings of the 24th International Conference on Machine Learning (ICML 2007)*, pp. 33-40, 2007.

L1-regularized logistic regression with an implementation in C++.

libLBFGS(LBFGS in C): http://www.chokkan.org/software/liblbfgs/

## Note

I quite like this post because it combines so many good things together: F#, logistic regression, optimization and playing with sample datasets.

Interesting post, thank you. I have meant to try out Solver Foundation for a while, and I think that moment is coming closer now.

ReplyDeleteThanks.

ReplyDeleteWhat would you recommend to show a graph?

@ Art,

ReplyDeleteMicrosoft Solver Foundation does not have any plot functionality. You can go to Sho library for the plot.

Very nice post, I wish use F# for data minning as soon.

ReplyDeleteThis comment has been removed by the author.

ReplyDeleteAn implementation of this method in OCaml: http://math.umons.ac.be/anum/fr/software/OCaml/Logistic_Regression/

ReplyDelete@Chris,

ReplyDeleteGreat to see the OCaml version!!

Yes It's a nice post this post is very use full for me

ReplyDeleteLogistic Service Provider