Summary notes for Coursera’s “Computing for Data Analysis”

Week 1

R is dialect of S.

Atomic data types
– character
– numeric (real), by default, like 1
– integer, need to be postfixed, like 1L
– compilex
– logical

Vector – contains objects with same type
List – contains objects with different time

Inf – infinity
NaN – as usual

R’s objects can have attributes (accessed through attributes() function)
1. Name, dimname
2. Dimensions
3. Class
4. Length
5. Other user defined attributes

Assignment
x <- 5
print(x) or just x
[1] 5 – mens 5 is one dimensional array

# – comment

x <- 1:20 – creates a sequence

Conversion
as.* functions, ex: x <- 5, as.complex(x)

Matrix
m <- matrix( 1:6, nrow=2, ncol=3) – filled by columns from up to down

Transform an array to matrix
m <- 1:10
dim(m) <- c(2, 5)

Build a matrix by cbind, rbind

Factors – categorises data, like enums

NA and NaN

Data frames – lists of lists, like matrix, but can contain objects with different type, each column can have a name.

R’s object can have names

Extract subset
[] – returns subset of the same type list of list or vector from vector (with name if it exists)
[[]] – return only an element as it is

drop = FALSE allows to return a matrix not an array

is.* functions checks condition

! – inverses array’s content

Vector and matrix can be processed by element: +, -, *, /
True matrix multiplication %*%

Reading/Writing data

read.* subset of functions
write.* subset of functions

str() function

Week 2

Control structures: if-else if-else, for, while, repeat (infinite loop), break, next (continue), return

if-else works as ternary operator

function parameters reposts parameters to another function, generic extracts these parameters, all parameters after dot-dot-dot should be named explicitly and with full (not partial) name

symbols are searched within global environment, then within packages loaded, order is matters. Packages can be loaded automatically while startup, or manually (function library(<package name>)). Manually loaded packages a loaded at place right after environment, other packages are shifted.

free variable is variable that is not locally assigned and is not a formal parameter of function

Lexical scoping – variable is searched within an environment in which the function was defined, if not found – parent environment is investigated, then next, until global environment or namespace of package.

In case of nested functions, the environment is a body of embraced function

Debugging

invisible() – prevents a function from returning something
traceback – prints callstack in case of error
debug/browser/trace/recover – step-by-step execution

lapply() – iterates over list of object and call the specified function for each element
sapply() – lapply() +  simplification of result – to vector or matrix
tapply() – apply + ability to use factor and group the elements
spilt() – groups some vector into groups by factor
mapply() – works over the set of lists in parallel.

Week 3

Simulation functions
r+norm() – generates random numbers
p+norm() – cumulative density
d+norm() – density
q+norm() – quantile

sample(vector, number) – returns a subset of of list with number length, if the number is not specified, it just permutes the list.

Base graphic model
plot(), hist() graphic functions
par() specifies global graphic parameters
lines(), points() – adds a lines or points to graphic

Lattice graphic model

Week 4

Color plotting
grDevices package: colorRamp(), colorRampPalette()

RegExp

1. The word itself is a simples RegExp
2. ^ $ – start and end of line
3. [^0-9a-zA-Z] character classes
4. . (dot) means any character, even empty character
5. | – pipe is represent an alternative choice
6. () – grouping
7. expression? – means the expression is optional
8. * – any number, even zero, + – at least one
9. {m,n} – interval qualifier at least m, not more than n; {m} – exact m times; {m,} – at least m
10. \1, \2 – remembering the match
11. (.*) – greedy, (.*?) – not greedy

RegExp in R

grep() returns numbers of strings that match the regexp, or (value=TRUE) the set of strings matched
grepl() return a logical vector indicating the strings matched
regexpr() provides with information about a place where match occurs and with length of the match, but only first match.
gregexpr() provides information about where the match is occur and with a length of the match for all matches within a string.
regmatches() function takes the vector of strings and a result of regexpr() function and returns the set of substrings that match the pattern
sub(), gsub() – substantiating the substring specified by regexp.
regexec() works like regexpr(), but provides an indexes of parenthesized expressions.

Classes and methods

Classes, methods, generics
getS3method(), getMethod() shows the functinon’s code
New classes are created through setClass() function
Class data elements called slots
setMethod() call defines methods for new class
showClass() function provides class’s description