Data Analytics with R, Excel & Tableau

Introduction to Data Structures in R

While you do programming in any programming language, you need to use various variables to store different data. Variables are however, reserved in a memory locations to store values. This implies that, once you create a variable you reserve some area in memory. Data Structures are the way of arranging data so that it can be used efficiently in a computer.
In contrast to different programming languages like C and Java, R doesn’t have variables declared as some data type. The variables are appointed with R-Objects and the knowledge form of the R-object becomes the datatype of the variable. There are many types of R-objects. The popularly used ones are

  • Vector
  • Matrix
  • Array
  • Lists
  • DataFrames

The very first type Vector we are covering in detail is present in Beginners Guide to R.
We will follow the rest of the data types.

R Data Structures: Matrix

A matrix is a vector with two attributes
Number of rows.
Number of columns
Matrices can be defined as 2-dimensional arrays. All columns in a matrix must have the same mode (numeric, character, etc.) and same length. A matrix can be created by using the matrix() function:
Syntax:
myymatrix <- matrix(vector, nrow=number_of_rows, ncol=number_of_columns, byrow=logical_value, dimnames=list( char_vector_rownames, char_vector_colnames))
Example:
Y <-matrix(c(1,2,3,4),nrow=2,ncol=2)

We can show the name of matrix by class function and the attributes will show the dimension for matrix.
Syntax:

class(matrix name)

attributes(matrix name)

Example:
class(y)
attributes(y)

Transpose of a matrix is to convert row into column and column into row.
Syntax:

variable = t(matrix name)

print(variable)

Here we see that the matrix y has been transposed.
Example:
Trans = t(y)
print(Trns)

Naming rows of matrix
We employ the function rownames() to change names of rows
And
For the columns, we want to supply a vector of different names for the four subs involved in the test, and use this to specify the colnames(X):
Example:
subs<-c(“Maths”, “English”, “Science”, “History”) colnames(X)<-subs X

 
Calculations on Rows Or Columns Of The Matrix
We can use subscripts to select parts of the matrix, with a blank meaning ‘all of the rows’ or ‘all the columns’. Here is the mean of the rightmost column (number 4)
Example:
mean(x[,4])

calculated over all the rows (blank then comma), and the standard deviation of the bottom row,
Example:
sd(x[3,])

There are some special functions for calculating summary statistics on matrices:
Example:
rowSums(x)
colSums(x)
rowMeans(x)
colMeans(x)

Adding Rows and Columns to The Matrix
Add a row at the bottom showing the column means, and a column at the right showing the row variances:
Example:
x<-rbind(x,apply(x,2,mean))
x<-cbind(x,apply(x,1,var))
X

Evaluating Functions with Apply
The apply function is used for applying functions to the rows or columns of matrices or data frames. For example:
Example:
y<-matrix(1:16,nrow=4)

Often, you want to apply a function across one of the margins of a matrix – margin 1 being the rows and margin 2 being the columns. Here are the total row (four of them):
Example:
apply(y,1,sum)

Here are the total columns (four of them):
Example:
apply(y,2,sum)

Example:
apply min to rows
apply(y, 1, min)

Example:
apply max to columns
apply(y, 2, max)

Example:
apply sqrt
apply(y,1,sqrt)

One can supply your own function definition within apply like this:
Example:
apply(y,1,function(y) y^2+y)

Sweep
The sweep function is used to ‘sweep out’ array summaries from vectors, matrices, arrays or data frames.
In this example, we want to express a matrix in terms of the departures of each value from its column mean.
Example:
matdata<- matrix(c(50,60,40,90,100, 80,50, 90,10, 80,30, 70),nrow=4)
Create a vector containing the parameters that you intend to sweep out of the matrix. Let’s say we want to compute the four column means:
Example:
cols<-apply(matdata,2,mean)
Now it is straightforward to express all the data in matdata as departures from the relevant column means:
Example:
sweep(matdata,2,cols)

Note: The uses of margin = 2 as the second argument indicate that we want the sweep to be carried out on the columns (rather than on the rows).

R Data Structures: Arrays

Arrays are the R data objects which can store information in addition to 2 dimensions, for instance − If we tend to create an array of dimension (2, 3, 4) then it creates 4 rectangular matrices every with a pair of rows and three columns. Arrays can store solely data type.
An array is formed using the array() operate. It takes vectors as input and uses the values within the dim parameter to form an array.
Example:
array<-1:20
is.matrix(array)
dim(array) #dim is for dimensions

The vector is not a matrix and it has no (NULL) dimensional attributes. We give the object dimensions like this (say, with five rows and four columns):
Example:
dim(array)<-c(5,4)
Now it does have dimensions and it is a matrix:
Example:
dim(array)
Example:
is.matrix(array)

When we look at array it is presented as a two-dimensional table (but note that it is not a table object):
Example:
array
is.table(array)

Note: The values have been entered into the array in column wise sequence: this is default in R.
Thus a vector is a one-dimensional array that lacks any dim attributes.
A matrix is a two-dimensional array.
Arrays of three or more dimensions do not have any special names in R. They are simply referred to as three-dimensional or five-dimensional arrays.
Here is a three-dimensional array of the first 16 lower-case letters with three matrices each of four rows and two columns:
Example:
A<-letters[1:16]
dim(A)<-c(2,2,4)

We want to select all the letters a to h. These are all the rows and all the columns of tables 1 and 2, so the appropriate subscripts are [,,1:2]
Example:
A[,,1:2]

Next, we want only the letters i to l. These are all the rows and all the columns from the third table, so the appropriate subscripts are [„3]:
Example:
A[„3]

Here, we want only b,d,f,h,j,l,n and p. These are the second rows of all four tables, so the appropriate subscripts are [2,,]:
Example:
A[2,,]

Note: When we drop the whole first dimension (there is just two row in A[2,,]) the shape of the resulting matrix is altered (two rows and four columns in this example). This is a feature of R, but you can override it by saying drop = F to retain all three dimensions:

R Data Structures: List

Like an R vector, an R list can contain items of different data types. List elements are accessed using two-part names, it is indicated with the dollar sign $ in R.
A list allows you to gather a variety of (possibly unrelated) objects under one name. For example, a list may contain a combination of vectors, matrices, data frames, and even other lists.
Syntax:
You create a list using the list() function: mylist <- list(object1, object2, …)
Consider a list:
Example:
j <- list(name=“hue”, sal=6000, union=F)
The list components can be accessed in several ways:

Adding and Deleting List Elements
Example:
j$gender <- “Male”

Example:
j$union<-NULL

We can see the union is no more to be found in the list j.
Applying Functions To Lists: lapply() and sapply() Functions
These two functions work in a similar fashion, traversing over a set of data like a list or vector, and calling the specified function for each item.
Using the lapply() and sapply() Functions
Example:
lapply(list(3:9,22:24),median)
sapply(list(2:4,45:49),median)

Recursive List
Lists can be recursive, meaning that you can have lists within lists.
Example:
c(list(a=3,b=4,c=list(d=6,e=8)))

R Data Structures: Data Frames

A data frame is a list, which contains the components of that list being equal-length vectors.
Example:
kids <- c(“Sam”, “Nick”, “Rits”)
ages <- c(11,10,8)
d <- data.frame(kids,ages,stringsAsFactors=FALSE)
d

We see the above data is matrix-like view.
Accessing Data Frames
Data frame can be accessed in multiple ways:
Example:
d[[1]]
d$kids
d[,1]

Other functions structure of data frame:
Example:
str(d)

Names of variables:
Example:
names(d)
Identifying class of a data:
Example:
class(d)

head() and tail()
First few(2) observations
Example:
head(d,2)
Last few(2) observations
Example:
tail(d,2)

rbind() and cbind() Functions
In using rbind() to add a row, the added row is typically in the form of another data frame or list.
Example:
rbind(d,list(“Lara”,15))

In using cbind() to add a row, the added column is typically in the form of another data frame or list.
Example:
gender <- c(“M”,”M”,”M”,”F”)
d <-cbind(d,gender)
d

R Data Structures: Factors

Categorical (nominal) and ordered categorical (ordinal) variables in R are called factors.
Factors are important in R as they determine how to analyze the data and present them visually.
An R factor might be viewed simply as a vector with extra information i.e. a record of the distinct values in that vector, called levels.
Example:
x <- c(1:3)
xfac <- factor(x)
Xf

 
Using tapply() Function
The tapply function is useful when we need to break up a vector into groups defined by some classifying factor, compute a function on the subsets, and return the results in a convenient form. You can even specify multiple factors as the grouping variable, for example treatment and sex, or team and handedness.
Example:
numbers <- c(24,28,45,36,34,26)
subject <- c(“maths”,“english”,“english”,“maths”,“english”,”maths”) tapply(numbers,subject,mean)

Hope you include these datatypes in your daily analysis.
For more interesting blogs on BigData and Analytics, please visit our website ACADGILD.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close