Saturday, July 12, 2014

R Notes: vectors


R is different from C family languages. It has a C syntax, but a Lisp semantics. Programmers from C/C++/Java world would find many usages in R adhoc and need to memorize special cases. This is because they use R from a C's perspective. R is a very elegant language if we unlearn some C concepts and know R’s rules. I am writing several R notes to explain several important R language rules. This is the first note.


The atomicity of R vectors

The atomic data structure in R is vector. This is so different from any C family language. In C/C++, built-in types such as int and char are atomic data structures while C array (a continuous data block in memory) is obviously not the simplest type. In R, vector is indeed the most basic data structure. There is no scalar data structure in R – you cannot have a scalar int in R as int x = 10 in C.

The atomicity of R vectors is written in many documents. The reason that it is usually skipped by R learners is that many R users come from C in which array is a composite data structure. Many seemingly special cases in R language all comes from the atomicity of R vectors. And I will try to cover them coherently.


x <- 10  # equivalent to x <- c(10)
x # or equivalent to print(x)

## [1] 10

y <- c(10, 20)

## [1] 10 20

What does [1] mean in the output? It means that the output is a vector and from index 1, the result is ... x is a vector of length 1, so its value is [1] 10, while y is a vector of length 2, so its value is [1] 10 20. For a vector with longer length, the output contains more indices to assist human reading:

z <- 1:25

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25

Vectors with different types

Though vectors in R are atomic. There are different vectors: int vector, float vector, complex vector, character vector and logical vector. Int and float vectors are numeric vectors. In above, we have seen int vectors. Let's see more types of vectors below:

x <- c(1, 2.1)

## [1] "numeric"

y <- c("a", "bb", "123")

## [1] "character"

z <- complex(real = 1, imaginary = 1)

## [1] "complex"

Notice that in R, string (In R's term: character type) is like int, float, logical types. It is not a vector of chars. R does not differentiate between a character and a sequences of characters. R has a set of special functions such as paste and strsplit for string processing, however R's character type is not a composite type and it is not a vector of chars either!

matrix and array

Matrix is a vector with augmented properties and this makes matrix an R class. Its core data structure is still a vector. See the example below:

y <- c(1, 2, 3, 4, 5, 6)
x <- matrix(y, nrow = 3, ncol = 2)

## [1] "matrix"

rownames(x) <- c("A", "B", "C")
colnames(x) <- c("V1", "V2")

## $dim
## [1] 3 2
## $dimnames
## $dimnames[[1]]
## [1] "A" "B" "C"
## $dimnames[[2]]
## [1] "V1" "V2"


##   V1 V2
## A 1 4
## B 2 5
## C 3 6


## [1] 1 2 3 4 5 6

In R, arrays are less frequently used. A 2D arrays is indeed a matrix. To find more: ?array. We can say that an array/matrix is a vector (augmented with dim and other properties). But we cannot say that a vector is an array. In OOP terminology, array/matrix is a subtype of vector.


Because the fundamental data structure in R is vector, all the basic operators are defined on vectors. For example, + is indeed vector addition while adding two vectors with length 1 is just a special case.

When the lengths of the two vectors are not of the same length, then the shorter one is repeated to the same length as the longer one. For example:

x <- c(1, 2, 3, 4, 5)
y <- c(1)
x + y # y is repeated to (1,1,1,1,1)

## [1] 2 3 4 5 6

z <- c(1, 2)
x + z # z is repeated to (1,2,1,2,1), a warning is triggered

## Warning: longer object length is not a multiple of shorter object length

## [1] 2 4 4 6 6

+,-,*,/,etc. are vector operators. When they are used on matrices, their semantics are the same when dealing with vectors – a matrix is treated as a long vector concatenated column by column. So do not expect all of them to work properly as matrix operators! For example:

x <- c(1, 2)
y <- matrix(1:6, nrow = 2)
x * y

##      [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 4 8 12

For matrix multiplication, we shall use the dedicated operator:

x %*% y  # 1 x 2 * 2 x 3 = 1 x 3

##      [,1] [,2] [,3]
## [1,] 5 11 17

y %*% x  # dimension does not match, c(1,2) is a row vector, not a col vector!

## Error: non-conformable arguments

The single-character operators are all operated on vectors and would expect generate a vector of the same length. So &, |, etc, are vector-wise logic operators.  While &&, ||, etc are special operators that generates a logic vector with length 1 (usually used in IF clauses).

x <- c(T, T, F)
y <- c(T, F, F)
x & y


x && y

## [1] TRUE

math functions

All R math functions take vector inputs and generate vector outputs. For example:


## [1] 2.718


## [1] 2.718

exp(c(1, 2))

## [1] 2.718 7.389

sum(matrix(1:6, nrow = 2))  # matrix is a vector, for row/col sums, use rowSums/colSums

## [1] 21

cumsum(c(1, 2, 3))

## [1] 1 3 6

which.min(c(3, 1, 2))

## [1] 2

sqrt(c(3, 2))

## [1] 1.732 1.414


NA is a valid value. NULL means empty.


## [1] NA



c(NA, 1)

## [1] NA  1

c(NULL, 1)

## [1] 1



*I find Knitr integrated with RStudio IDE is very helpful to write tutorials.


  1. Thanks very much! Really clear. I'm looking forward to your next set of notes. :-)

  2. We at COEPD glad to announce that we have introduced Dot Net Technologies Internship Programs (Self sponsored) for professionals who want to have hands on experience. This program is available in COEPD Hyderabad premises which is accompanied by IT Companies. It is intelligently dedicated to our firm participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Dot Net Technologies discipline. We assume Object-Oriented Programming concepts and teaches C#.NET, ADO.NET which helps the interns to build database-driven Web applications and Web Sites successfully. This internship is designed to gain theoretical knowledge and also hands-on practice and practical know-how to master the nitty-gritty of the Dot Net developer profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.

  3. Superb Post
    "Sanjary Academy provides excellent training for Piping design course. Best Piping Design Training Institute in Hyderabad,
    Telangana. We have offer professional Engineering Course like Piping Design Course,QA / QC Course,document Controller
    course,pressure Vessel Design Course, Welding Inspector Course, Quality Management Course, #Safety officer course."
    Piping Design Course in India­
    Piping Design Course in Hyderabad
    Piping Design Course in Hyderabad
    QA / QC Course
    QA / QC Course in india
    QA / QC Course in Hyderabad
    Document Controller course
    Pressure Vessel Design Course
    Welding Inspector Course
    Quality Management Course
    Quality Management Course in india
    Safety officer course

  4. Nice Information
    Yaaron Studios is one of the rapidly growing editing studios in Hyderabad. We are the best Video Editing services in Hyderabad. We provides best graphic works like logo reveals, corporate presentation Etc. And also we gives the best Outdoor/Indoor shoots and Ad Making services.
    video editors studio in hyderabad
    short film editors in hyderabad
    corporate video editing studio in hyderabad
    ad making company in hyderabad