Monday, July 14, 2014

R Notes: Functions

R's semantics is modeled after Scheme. Scheme is a functional language and R is functional too. I am writing about the functions in R and many R's strange usages are just syntax sugars of special function calls.

What is rownames(x) <- c('A','B','C')?

y <- c(1, 2, 3, 4, 5, 6)
x <- matrix(y, nrow = 3, ncol = 2)
rownames(x) <- c("A", "B", "C")
x


##   [,1] [,2]
## A 1 4
## B 2 5
## C 3 6


How can we assign c('A','B','C') to the return value of rownames(x), yet the assignment effect is done on x?! This is because the statement is a syntax sugar for the following function call:



x <- `rownames<-`(x, c("A1", "B1", "C1"))


First, rownames<- is indeed a function! Because it has special characters, to apply it we need put " around the function name. Second, "rownames<-"(x, c('A1','B1','C1')) is a pure function call. It returns a new copy of x and does not change the row names of x. To take the assignment effect, we must assign the return value back to x explicitly.



The technical term for functionname<- is replacement function. The general rule:



f(x) <- y is transformed as x <- "f<-(x,y)".



Besides rownames, there are many other functions that have a twin replacement function, for example, diag and length.



A special case is index operator "[". Its replacement function is "[<-":



y <- c(1, 2, 3)
"["(y,1)


## [1] 1


`[<-`(y, 2, 100)


## [1]   1 100   3


More examples at R language defination 3.4.4.



Operators are functions



Operators in R are indeed function calls.



"+"(c(1,2), 2)  # same as c(1,2) + 2


## [1] 3 4


Because operators are functions, we can define new operators using function syntax:



# strict vector add
"%++%" <- function(x, y) {
n <- length(x)
m <- length(y)
if (n != m) {
stop("length does not match")
} else {
x + y
}
}


Self-defined operators become more powerful when used with S3 objects (see below).



S3 functions



There are two object systems in R, S3 and S4. S3 objects are lists, so its fields are accessed by the same operator to access list fields $. S4 objects are intended to be safer than S4 objects. Its fields are accessed by @. Many packages, especially those with a long history, such as lars, rpart and randomForest, use S3 objects. S3 objects are easier to use and understand.



Knowing how to implement an S3 object is very useful when we need to check the source code of a specific R package. And when we want to release a small piece of code for others to use, S3 objects provide a simple interface.



The easiest way to understand S3 objects is the following analogy:





















R ANSI C
S3 object/R list object C struct
S3 functions functions on struct


struct MyVec {
int *A;
int n;
};

int safe_get(struct MyVec vec, int i) {
if (i<0 || i>=vec.n) {
fprintf(stderr, "index error");
exit(1);
}
return vec.A[i];
}



In R, the S3 object is implemented as:



vec <- list()
vec$A <- c(1, 2, 3)
vec$n <- length(vec$A)
class(vec) <- "MyVec"


In the S3 object system, the method names cannot be set freely as those in C. They must follow a pattern: “functionname.classname”. Here my class name is MyVec, so all the methods names must end with .MyVec.



"[.MyVec" <- function(vec, i) {
if (i <= 0 || i > vec$n) {
stop("index error")
}
vec$A[i]
}


Let's implement the replacement function too:



"[<-.MyVec" <- function(vec, i, value) {
if (i <= 0 || i > vec$n)
stop("index error")
vec$A[i] <- value
vec
}


Let's play with MyVec objects:



vec[3]


## [1] 3


vec[2] <- 100
vec[30]


## Error: index error


We can also add other member functions for MyVec such as summary.MyVec, print.MyVec, plot.Vec, etc. To use these functions, we don't have to specify the full function names, we can just use summary, print, and plot. R will inspect the S3 class type (in our case, it is MyVec) and find the corresponding functions automatically.



Default parameters and ...



Many functions in R have a long list of parameters. For example plot function. It would becomes tedious and even impossible for the end user to assign the values for every parameter. So to have a clean interface, R supports default parameters. A simple example below:



add2 <- function(a, b = 10) {
a + b
}
add2(5)


## [1] 15


What I want to emphasize in this section is ..., which is called variable number of arguments. And it is universal in R to implement good function interfaces. If you read the documents of R's general functions such as summary and plot, most of their interfaces include ....



Consider the following case: I am implementing a statistical function garchFit, say GARCH model calibration, I used a optimizer optim which has a lot of parameters. Now I need to think about the API of my GARCH calibration function because I want others to use it as well. Shall I expose parameters of optim in garchFit's parameters? Yes, since I want to give the users of my function some freedom in optimizing. But as we know a single procedure in optim such as l-bfgs would have many parameters. On one side, I want to give the user the option to specify these parameters, on the the side, if I expose all of them in my garchFit, the parameter list would go too long. ... comes to the rescue! See the following example:



f1 <- function(a = 1, ...) {
a * f2(...)
}
f2 <- function(b = 5, ...) {
b * f3(...)
}
f3 <- function(c = 10) {
c
}
f1()


## [1] 50


f1(a = 1, b = 2)


## [1] 20


f1(c = 3)


## [1] 15


A simple user of f1 would only need to study its exposed parameter a, while advanced users have options to specify parameters in f2 and f3 when calling f1.



Global variables



Global variables are readly accessible from any R functions. However, to change the value of a global variable, we need a special assignment operator <<-. Python has similar rules. See the following example:



a <- 100
foo <- function() {
b <- a # get a's value
a <- 10 # change a's value fails, (actually done: creates a local variable a, and assign 10)
c(a, b)
}
foo()


## [1]  10 100


a


## [1] 100


boo <- function() {
a <<- 10
}
boo()
a


## [1] 10


Here our scopes have two layers “global” and top-layer functions (foo and boo). When there are more layers, i.e., nested functions, <<- operator finds the variable with the same name in the closet layer for assignment. But it is generally very bad practice to have same variable names across different function layers (except for variables like i, j, x). assign is more general, check ?assign for document.



Variable scope



I think this is the most tricky part of R programming for C programmers. Because block syntax {...} does not introduce a new scope in R while C/C++/Java/C#/etc all introduce a new scope! In R, only a function introduce a new scope.



Please see my previous post: Subtle Variable Scoping in R



quote, subsititude, eval, etc.



Many language-level features of R such as debugging function trace is implemented in R itself, rather than by interpreter hack because R supports meta-programming. I will write another post for these special functions.

5 comments:

  1. "S4 objects are intended to be safer than S4 objects"
    Do you mean "safer than S3 objects"?

    Nice post!

    ReplyDelete
  2. We at COEPD glad to announce that we have introduced Dot Net Technologies Internship Programs (Self sponsored) for professionals who want to have hands on experience. This program is available in COEPD Hyderabad premises which is accompanied by IT Companies. It is intelligently dedicated to our firm participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Dot Net Technologies discipline. We assume Object-Oriented Programming concepts and teaches C#.NET, ADO.NET which helps the interns to build database-driven Web applications and Web Sites successfully. This internship is designed to gain theoretical knowledge and also hands-on practice and practical know-how to master the nitty-gritty of the Dot Net developer profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.

    http://www.coepd.com/DotNet-Internship.html

    ReplyDelete
  3. Nice Information
    "Sanjary Academy provides excellent training for Piping design course. Best Piping Design Training Institute in Hyderabad,
    Telangana. We have offer professional Engineering Course like Piping Design Course,QA / QC Course,document Controller
    course,pressure Vessel Design Course, Welding Inspector Course, Quality Management Course, #Safety officer course."
    Piping Design Course in India­
    Piping Design Course in Hyderabad
    Piping Design Course in Hyderabad
    QA / QC Course
    QA / QC Course in india
    QA / QC Course in Hyderabad
    Document Controller course
    Pressure Vessel Design Course
    Welding Inspector Course
    Quality Management Course
    Quality Management Course in india
    Safety officer course

    ReplyDelete
  4. Thanks for sharing information
    Yaaron Studios is one of the rapidly growing editing studios in Hyderabad. We are the best Video Editing services in Hyderabad. We provides best graphic works like logo reveals, corporate presentation Etc. And also we gives the best Outdoor/Indoor shoots and Ad Making services.
    Best video editing services in Hyderabad,ameerpet
    Best Graphic Designing services in Hyderabad,ameerpet­
    Best Ad Making services in Hyderabad,ameerpet­

    ReplyDelete