Summary notes for Coursera’s “Computing for Data Analysis”

Week 1

R is dialect of S.

Atomic data types
– character
– numeric (real), by default, like 1
– integer, need to be postfixed, like 1L
– compilex
– logical

Vector – contains objects with same type
List – contains objects with different time

Inf – infinity
NaN – as usual

R’s objects can have attributes (accessed through attributes() function)
1. Name, dimname
2. Dimensions
3. Class
4. Length
5. Other user defined attributes

Assignment
x <- 5
print(x) or just x
[1] 5 – mens 5 is one dimensional array

# – comment

x <- 1:20 – creates a sequence

Conversion
as.* functions, ex: x <- 5, as.complex(x)

Matrix
m <- matrix( 1:6, nrow=2, ncol=3) – filled by columns from up to down

Transform an array to matrix
m <- 1:10
dim(m) <- c(2, 5)

Build a matrix by cbind, rbind

Factors – categorises data, like enums

NA and NaN

Data frames – lists of lists, like matrix, but can contain objects with different type, each column can have a name.

R’s object can have names

Extract subset
[] – returns subset of the same type list of list or vector from vector (with name if it exists)
[[]] – return only an element as it is

drop = FALSE allows to return a matrix not an array

is.* functions checks condition

! – inverses array’s content

Vector and matrix can be processed by element: +, -, *, /
True matrix multiplication %*%

Reading/Writing data

read.* subset of functions
write.* subset of functions

str() function

Week 2

Control structures: if-else if-else, for, while, repeat (infinite loop), break, next (continue), return

if-else works as ternary operator

function parameters reposts parameters to another function, generic extracts these parameters, all parameters after dot-dot-dot should be named explicitly and with full (not partial) name

symbols are searched within global environment, then within packages loaded, order is matters. Packages can be loaded automatically while startup, or manually (function library(<package name>)). Manually loaded packages a loaded at place right after environment, other packages are shifted.

free variable is variable that is not locally assigned and is not a formal parameter of function

Lexical scoping – variable is searched within an environment in which the function was defined, if not found – parent environment is investigated, then next, until global environment or namespace of package.

In case of nested functions, the environment is a body of embraced function

Debugging

invisible() – prevents a function from returning something
traceback – prints callstack in case of error
debug/browser/trace/recover – step-by-step execution

lapply() – iterates over list of object and call the specified function for each element
sapply() – lapply() +  simplification of result – to vector or matrix
tapply() – apply + ability to use factor and group the elements
spilt() – groups some vector into groups by factor
mapply() – works over the set of lists in parallel.

Week 3

Simulation functions
r+norm() – generates random numbers
p+norm() – cumulative density
d+norm() – density
q+norm() – quantile

sample(vector, number) – returns a subset of of list with number length, if the number is not specified, it just permutes the list.

Base graphic model
plot(), hist() graphic functions
par() specifies global graphic parameters
lines(), points() – adds a lines or points to graphic

Lattice graphic model

Week 4

Color plotting
grDevices package: colorRamp(), colorRampPalette()

RegExp

1. The word itself is a simples RegExp
2. ^ $ – start and end of line
3. [^0-9a-zA-Z] character classes
4. . (dot) means any character, even empty character
5. | – pipe is represent an alternative choice
6. () – grouping
7. expression? – means the expression is optional
8. * – any number, even zero, + – at least one
9. {m,n} – interval qualifier at least m, not more than n; {m} – exact m times; {m,} – at least m
10. \1, \2 – remembering the match
11. (.*) – greedy, (.*?) – not greedy

RegExp in R

grep() returns numbers of strings that match the regexp, or (value=TRUE) the set of strings matched
grepl() return a logical vector indicating the strings matched
regexpr() provides with information about a place where match occurs and with length of the match, but only first match.
gregexpr() provides information about where the match is occur and with a length of the match for all matches within a string.
regmatches() function takes the vector of strings and a result of regexpr() function and returns the set of substrings that match the pattern
sub(), gsub() – substantiating the substring specified by regexp.
regexec() works like regexpr(), but provides an indexes of parenthesized expressions.

Classes and methods

Classes, methods, generics
getS3method(), getMethod() shows the functinon’s code
New classes are created through setClass() function
Class data elements called slots
setMethod() call defines methods for new class
showClass() function provides class’s description

Last part of summary of great book about COM+ATL

ATL

Handles COM infrastructure, implements all common stuff.

ATL is set of headers, is VS type of project.

Provides a class “CComModule” and global instance “_Module”, it initialized/uninitialized within DllMain (attach/detach).

It provides set of operations – register/unregister COM dll, creating class object.

BEGIN_OBJECT_MAP(ObjectMap)/END_OBJECT_MAP() – embraces all coclasses defined within module by OBJECT_ENTRY(<coclass name>) macro.

Registration for COM servers: for inrpoc – “regsvr32” utility, for local servers – “register/unregister” parameter for WinMain.

CoClass template

class ATL_NO_VTABLE CCoHexagon :

public CComObjectRootEx<CComSingleThreadModel>,

public CComCoClass<CCoHexagon, &CLSID_CoHexagon>,

public IDraw

{

public: CCoHexagon() { }

DECLARE_REGISTRY_RESOURCEID(IDR_COHEXAGON) DECLARE_PROTECT_FINAL_CONSTRUCT()

BEGIN_COM_MAP(CCoHexagon)

COM_INTERFACE_ENTRY(IDraw)

END_COM_MAP()

// IDraw

public:

STDMETHOD(<method name>)();

};

CComObjectRootEx<> provides IUnknown implementation

BEGIN_COM_MAP(<coclass name>)/END_COM_MAP() embraces interfaces declaration by COM_INTERFACE_ENTRY(<interface name>)

CComCoClass<CCoHexagon, &CLSID_CoHexagon> – defines coclass factory, handles aggregation

rgs-files – registry scripting

Properties – for VB, for C++ just a syntax sugar: methods get_<prop name>, set_<prop name>

IDL definition:[propget], function parameter specification: [out, retval]

COM string: CComBSTR – wrapper over raw BSTR

Text conversion macroses: converts between C (const C++ string), A (ANSI char*), BSTR, W, T, OLE strings.

ATLTRACE – trace macro for ATL debugging.

Apartments – it is an idea, it is like a “room” where some coclasses are placed, there is two kind of “room”:

– STA provides sync via invisible window’s messages, so all methods of all coclasses within the same “room” will be called sequentially.

– MTA – does not provide any sync.

Process can contain

– 0 or 1 MTA

– 0 or 1 STA as an addition to 1 MTA, first STA – is called “main STA”.

For local server:

CoInitialize() turns current thread into separate STA.

CoInitializeEx(NULL, COINIT_MULTITHREADED) turns current thread into MTA.

For inproc server:

HKCR\CLSID\{guid}\InprocServer32, key “ThreadingModel”, can hold values:

– (none) – each object will join main STA

– Apartment – each object loaded into separate STA

– Free – all objects loaded into MTA

– Both – apartement model is defined by client setting

Proxy/stubs are used: different STAs communicate, MTA communicates to STA

Proxy/stubs are NOT used: STA communicates to itself, MTA communicates to itself.

ATL support classes for thread models: CComSingleThreadModel and CComMultiThreadModel.

CComObject<> class family is parametrized by final ATL CoClass (which is abstract itself), it provides final implementation for IUnknown.

ATL_NO_VTABLE – macros prevents vtable creating for all at inheritance hierarchy  besides final object (object size and object creation speed optimization)

FinalConstruct(), FinalRelease() – should be used instead constructor/destructor in order to avoid issues during object creation/destruction (there are some hacks with NO_VTABLE object infrastructure)

CComObjectRootBase – contains reference count variable and aggregation support

CComObjectRootEx<> – apartments support

CComCoClass<> – class factories, aggregation and error handling

CComObject<> – provides implementation for actual IUnknown interface

Public section of coclass should contains COM_MAP:

BEGIN_COM_MAP(<coclass name>)

COM_INTERFACE_ENTRY(<coclass public specific interface name>)

END_COM_MAP()

COM error handling (COM exceptions)

Errors processing

Interface ISupportErrorInfo, one method InterfaceSupportsErrorInfo – defines whether some interface supports possibility to get an error.

Actions of COM server object in case error happens:

1. ICreateErrorInfo * p1;

2. CreateErrorInfo(p1);

3. IErrorInfo * p2;

4. p1->QueryInterface(, p2);

5. SetErrorInfo(NULL, p2);

ICreateErrorInfo – provides an ability to set error’s description, GUID, Help, Source

IErrorInfo – interface of error passed into client’s thread

SetErrorInfo – function actually sets an error for some thread

Actions of COM client’s code in case error happens:

1. Got a bad HRESULT.

2. Try to get ISupportErrorInfo.

3. Define whether the failed interface can provide an error information (by InterfaceSupportsErrorInfo call)

4. Call GetErrorInfo() to get IErrorInfo interface and call its methods to get the information about the error.

Nested classes is an alternative way how the COM class can be constructed. Parent class is inherited only from IUknown. Internal classes get a pointer at parent class – it allows nested classes to access the parents, methods and also IUknown interface. This way is used in case some interfaces has the methods with equal names, so it is impossible to use multiple inheritance. It is possible to mix these two approaches.

Tear-off interfaces

Some COM interface is implemented by nested internal class of a friend class in order to avoid vtp-table (object construction time) and objects size reduce.

COM reuse

Wrapped object should have an IUnknow pointer of wrapper object

FinalConstruct and FinalRelease are good places to deal with nested object

Containment  – mean wrapper object is inherited from the interface of wrapped object, provides an implementation of each it method, but just redirects calls to wrapped object.

Aggregation – when the wrapper object exposes an interface of wrapped object itself. Not all COM classes can be aggregated, special infrastructure should be supported by wrapped object, in should be INPROC server and be at the same apartment as the its wrapper.

CComModule – represents body of COM class – dll or exe.

All classes are registered at OBJECT_MAP, it is easy to add some COM class by OBJECT_ENTRY macro, these classes will be created automatically. It is possible to specify some coclasses which will not be created automatically – OBJECT_ENTRY_NON_CREATABLE.

Creator class – auxiliary class that helps to construct an object.

GetObjectDescription() – coclass method defines description of some class

Category (identified by CATID) – describes some set of interface grouped by logical meaning.

In case we say that class supports some category this means that class supports all interfaces from the category.

ObjectMain() coclass method of ATL’s coclass is called when server is started or stopped. It is good place to acquire and release resources.

IDispatch – for lang that doesn’t support v-table. All params, that are passed through the interface should be variants.

IDispatch : IUknown, methods:

1. GetTypeInfo

2. GetTypeInfoCount

3. GetIDsOfNames – translate method or property name into DISPID.

4. Invoke – call the specified method of property.

Methods can be implemented manually or with a support of type libraries.

DISPID – identificator, describes some interface method of property, it is long, not GUID.

Work with raw variant:

VARIANT myOtherVar;

VariantInit(&myOtherVar);

myotherVar.vt = VT_I4;

myotherVar.lVal = 5000;

VariantClear(&myotherVar);

another methods: VariantCopy(), VariantChangeType().

Variant object wrapper: _variant_t

DISPPARAMS – structure-container that stores all parameters that passed into dispatch-method.

SAFEARRAY – arrays, that handled through special APIs

Dual interface – supports both v-table and dispatch interface.

IDispatchImpl<> – template class that helps to implement IDispatch interface.

Multiple dual interfaces is not good idea – since the script clients can get only one default dispatch interface and on script clients can get v-table interfaces without dispatch.

Enumerations

COM enumerations are non editable containers, like C-arrays.

IEnumUnknown

IEnumVariant

IEnumString

IEnumConnectionPoints

IEnumConnections

Methods, that provides each interface:

Next()

Skip()

Reset()

Clone()

It is required to declare such interface for particular data type in class that implements it. ATL provides template CComIEnum<>, CComEnum<> and CComIEnumImpl<>.

Collections

Editable COM container.

Exposes its functionality through dispatch

Expected methods:

1. Item()

2. Add()

3. Remove()

4. Count()

COM collection class just inherits IDispatch and implements a container for business COM classes.

Callback interfaces and connectable objects

IConnectionPointContainer – is inherited by some class that provides one or more event’s sources. Provides methods EnumConnectionPoints(**p) and FindConnectionPoint(guid, **p)

IConnectionPoint – interface is implemented by event source object