## Summary notes for Coursera’s “Computing for Data Analysis”

October 22, 2012 Leave a comment

**Week 1**

R is dialect of S.

Atomic data types

– character

– numeric (real), by default, like 1

– integer, need to be postfixed, like 1L

– compilex

– logical

Vector – contains objects with same type

List – contains objects with different time

Inf – infinity

NaN – as usual

R’s objects can have attributes (accessed through **attributes()** function)

1. Name, dimname

2. Dimensions

3. Class

4. Length

5. Other user defined attributes

Assignment

x **<-** 5

print(x) or just x

[1] 5 – mens 5 is one dimensional array

**#** – comment

x <- 1:20 – creates a sequence

Conversion

as.* functions, ex: x <- 5, as.complex(x)

Matrix

m <- matrix( 1:6, nrow=2, ncol=3) – filled by columns from up to down

Transform an array to matrix

m <- 1:10

dim(m) <- c(2, 5)

Build a matrix by cbind, rbind

Factors – categorises data, like enums

NA and NaN

Data frames – lists of lists, like matrix, but can contain objects with different type, each column can have a name.

R’s object can have names

Extract subset

**[]** – returns subset of the same type list of list or vector from vector (with name if it exists)

**[[]]** – return only an element as it is

**drop = FALSE** allows to return a matrix not an array

**is.*** functions checks condition

**!** – inverses array’s content

Vector and matrix can be processed by element: +, -, *, /

True matrix multiplication **%*%**

**Reading/Writing data**

**read.*** subset of functions

**write.*** subset of functions

str() function

**Week 2**

**Control structures**: if-else if-else, for, while, repeat (infinite loop), break, next (continue), return

if-else works as ternary operator

**…** function parameters reposts parameters to another function, generic extracts these parameters, all parameters after dot-dot-dot should be named explicitly and with full (not partial) name

**symbols are searched** within global environment, then within packages loaded, order is matters. Packages can be loaded automatically while startup, or manually (function **library(<package name>)**). Manually loaded packages a loaded at place right after environment, other packages are shifted.

**free variable** is variable that is not locally assigned and is not a formal parameter of function

**Lexical scoping** – variable is searched within an environment in which the function was defined, if not found – parent environment is investigated, then next, until global environment or namespace of package.

In case of **nested functions**, the environment is a body of embraced function

**Debugging**

**invisible()** – prevents a function from returning something

**traceback** – prints callstack in case of error

**debug/browser/trace/recover** – step-by-step execution

**lapply()** – iterates over list of object and call the specified function for each element

**sapply()** – lapply() + simplification of result – to vector or matrix

**tapply() **– apply + ability to use factor and group the elements

**spilt()** – groups some vector into groups by factor

**mapply()** – works over the set of lists in parallel.

**Week 3 **

Simulation functions

**r+norm()** – generates random numbers

**p+norm()** – cumulative density

**d+norm()** – density

**q+norm()** – quantile

**sample(vector, number)** – returns a subset of of list with number length, if the number is not specified, it just permutes the list.

**Base** graphic model

**plot(), hist()** graphic functions

**par()** specifies global graphic parameters

**lines(), points()** – adds a lines or points to graphic

Lattice graphic model

**Week 4**

**Color plotting**

grDevices package: colorRamp(), colorRampPalette()

**RegExp**

1. The word itself is a simples RegExp

2. ^ $ – start and end of line

3. [^0-9a-zA-Z] character classes

4. . (dot) means any character, even empty character

5. | – pipe is represent an alternative choice

6. () – grouping

7. expression? – means the expression is optional

8. * – any number, even zero, + – at least one

9. {m,n} – interval qualifier at least m, not more than n; {m} – exact m times; {m,} – at least m

10. \1, \2 – remembering the match

11. (.*) – greedy, (.*?) – not greedy

**RegExp in R**

**grep()** returns numbers of strings that match the regexp, or (value=TRUE) the set of strings matched

**grepl()** return a logical vector indicating the strings matched

**regexpr()** provides with information about a place where match occurs and with length of the match, but only first match.

**gregexpr()** provides information about where the match is occur and with a length of the match for all matches within a string.

**regmatches()** function takes the vector of strings and a result of **regexpr()** function and returns the set of substrings that match the pattern

**sub(), gsub()** – substantiating the substring specified by regexp.

**regexec()** works like **regexpr()**, but provides an indexes of parenthesized expressions.

**Classes and methods**

Classes, methods, generics

**getS3method(), getMethod()** shows the functinon’s code

New classes are created through **setClass()** function

Class data elements called **slots**

**setMethod() **call defines methods for new class

**showClass()** function provides class’s description