Data Analytics with R, Excel & Tableau

Introduction to R: A Beginner’s guide

What Does R do?

R is an open source programming language and software atmosphere for statistical computing and graphics that is supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software system and data analysis.

In this blog, we will go through getting started with R and the basic commands in R language.

Here, we also discuss the Vector data type in detail. We also present to you a command and output screenshot as the execution proof for the same.

Let see who are the Relatively high-profile users of R include:

Facebook: Used by some within the company for tasks such as analyzing user behavior.
Google: There are more than 500 R users at Google, according to David Smith at Revolution Analytics, doing tasks such as making online advertising more effective.
National Weather Service: Flood forecasts.
Orbitz: Statistical analysis to suggest best hotels for promotion to its users.
Trulia: Statistical modeling.

Installing R:

Beginners who do not have R installed, please go to the link below:
https://cran.r-project.org/bin/windows/base/

This will download an .exe file which will execute and install the R language in your Windows system. Also, download and install RStudio on your desktop. This will give an environment to work with R language.

Download RStudio


After installing RStudio, you will get a welcome page, as shown in the screenshot below. In this post, we will see that the same Rstudio executes all the basic commands on R.

Create a new R script to open the scripting shell in R-studio. Follow the screenshot below:

This will give you the screen below:

You can now start working on Rstudio. We have attached screenshots with every command for you to practice and refer to for better understanding.
Note: Please do check the commands after copying to Rstudio, as it may give some error due to changes in the ASCII values of the characters.

1: Setwd

Syntax: setwd(“~/mydirectory”)
Change your working directory with the setwd() function. Note that the slashes have to be either forward or double backward slashes. For Windows, the command might look something like:
setwd(“C:/ACD/Documents/RProjects”) setwd(“C:\\ACD\\Documents\\RProjects”)
Screenshot:

2: Install Packages

Syntax: install.packages(“package name”)
You can do pretty much anything in R, using 10000+ packages in R at CRAN or at the Comprehensive R Archive Network.
The command for installing a package is: install.packages(“thepackagename”); e.g., install.packages(“sqldf”)

If you don’t want to type the command by yourself, in RStudio you can see that there’s a “Packages” tab on the lower right of the window, click that and you’ll see a button to “Install Packages.”

3: Updated Packages

Syntax: update.packages()
For updated packages, you can run the following to get the latest versions of all your installed packages:
E.g. update.packages()

To do it all at once:

Another dialogue box will appear. Press “Update.”

4: removepackage

Syntax: remove.packages(“thepackagename”)
To remove a package on your system, type in the following:
remove.packages(“sqldf”)

5: Function Help

Syntax:

help(functionName)

?functionName

This is a shortcut to the help function, which uses parentheses. E.g.: help(sqldf)

If you want to find out more about a function, type: ?functionName;
E.g. ?median

As you see, both give the exact same details.

6: Assignment

Objects obtain values in R by assignment (‘x gets a value’).
This is done either by “<-“ or “=”
Thus, to create a scalar constant x with value 6, we type:
x <- 6 or x = 6
y <- “a” or y = “a”

7: Operators

The operators available in R are listed below. Hope you already understand the use of each of these operators.

8: Modulo and Integer Quotients

Integer Quotients
Syntax: x % / % y
To know the integer part of a division, say, how many 2s are there in 50, type in the following:
50%/%2

Modulo
Syntax: x %% y
To know the remainder (what is left over when 50 is divided by 2):
In math, this is known as modulo.
E.g. 50%%2

9: ScreenPromt

The screen prompt > is an invitation to put R to work.

Each line can have 128 characters.
Two or more expressions can be on a single line, as long as they are separated by semi-colons:

10: In-Built Functions

The log function gives logs to the base e (e = 2718), for which the antilog function is exp.

log(10) [1] 2.302585 exp(1) [1] 2.718282

11: Exponents

Syntax: e
For very big or small numbers, R uses the following scheme:
Scheme                           Meaning
1.9e3                                   1900 i.e. 1.9 multiplied by 1000
1.9e-2                                 0.019 i.e. 1.9 multiplied by 1/100

12: Rounding

These Functions are used to convert the decimals into integers.
There are multiple types of roundings:

rounding up

rounding down
These can be done in R using:
Syntax: floor(Decimal Number)
And the ‘next integer’ function is ceiling.
Syntax: ceiling(Decimal Number):
E.g.:
floor(5.9)
ceiling(2.9)

These are the very basics of R. Any beginner can start with these commands to get familiar with R studio.
To know more about R, we move forward to see Data Structures in R.
We have the following types of Data Structures:

Vector

Matrix

Array

Lists

DataFrames

 Classified under 2 branches:

Homogeneous Heterogeneous
1d vector List
2d Matrix Data frame
Nd Array

Out of all, we will cover only “Vector” in detail. Other data types are equally interesting, but as beginners, let us keep it short and simple.
Vectors
A collection of values that all have the same data type. The elements of a vector are all numbers, giving a numeric vector, or all character values, giving a character vector. Also, there is another type of vector that is present, which we know it as a logical vector.

13: Creating a Vector

Vectors are variables with one or more than one values of the same type: logical, integer, real, complex, and string. Vectors could also have a length equal to 0.

14: Length of a Vector

Syntax: length(Vector name)
This will show the length of a vector specified.
E.g. length(a)

The length of the longest vector is assigned to a derived vector (created by calculation), here A is of length 5 and B is of length 2:

15: Types of Vectors

Numeric vector

Character vector

Logical vector

Numeric vector
E.g.
a <- c(4,3,6.3,6,-8,9)
Character vector
E.g.
b <- c(“nine”,”two”,”eight”)
Logical vector
E.g.
c <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE)

Next, we learn how to refer to these elements.
E.g. a[c(1,3)] # refers to the 1st and 3rd element of a vector.

16: How to Work with Vectors & Logical Subscripts?

Take the example of a vector containing the 8 numbers, from 0 to 7:
E.g.
y<-0:7

To add up all the values:
E.g. sum(y)

To know how many of the values were less than 3:
E.g.
sum(y<3)

To find the sum of the values of x that are less than 3, we write:
E.g. sum(y[y<5])

To find out the logical condition x<3 is either true or false:
E.g. y<3

17: Vector Functions in R

Important vector functions are listed in the Vector functions used in R.

All these functions are same as the functions that we use in mathematics. Hope readers understand these basic functions.

18: How to Work with Vectors and Logical Subscripts

To find out the sum of the two largest values in a vector.
First, sort the vector in descending order, then add up the values of the last two elements of the sorted array.
E.g.
Let’s do this in stages. First, the values of a:
a<-c(2,9,9,9,2)
Now if you apply “sort” to this, the numbers will be sorted by ascending sequence,
sort(a)
sum(sort(a)[2:3])

19: Logical Arithmetic

Arithmetic involving TRUE or FALSE can be done in R .
R can coerce TRUE or FALSE into numerical values: 1 for TRUE and 0 for FALSE.
E.g.
a<-0:6
Is a less than 4?
a<4
Any Value greater than 0?
all(a>0)
Any Value greater than -5?
all(a>-5)
Any Value less than 2?
any(a<2)
Sum of values less than 2 in vector a sum(a<2)

We have many other operators which are pretty similar to mathematical operators listed below.

20. Logical Operations

List of operators:

21. Generating Regular Sequences of Numbers

For regularly spaced sequences, involving integers, it is simplest to use the colon operator. This can produce ascending or descending sequences:
Seq(5,15)
[1] 5 6 7 8 9 10 11 12 13 14 15
E.g.
Use the seq function to go from 0 up to 5 in steps of 0.5:
seq(0,5,0.5)
E.g.
Sequencing downwards from 5 down to 0 in steps of 0.5:
seq(5,0,-.5)

22: Generating Repeated Sequence

The rep() (or repeat) function puts the same constant into long vectors. The call form is rep(x,times).
x <- rep(6,4)
rep(c(11,12,13),3)
rep(1:2,3)
rep(c(11,12,13),each=2)

23: Identifying Missing Values

Syntax:-

mean (vector name, na.rm=T)

is.na(vector name)

Missing values are a cause of concern and can be dealt accordingly
Suppose we have a vector
a<-c(NA,11,12,NA,NA)
To handle the missing values, using the na.rm=TRUE argument
E.g.
mean (a,na.rm=T)

To check for the location of missing values within a vector, use the function is.na(x)
E.g.
which(is.na(a))

To convert the NA to 0, use the ifelse function:
E.g.
ifelse(is.na(a),0,a)

24: Sorting, Ranking & Ordering

For example:
sales<-c(100,50,75,150,200,25)
Now apply the three different functions to the vector called sales
rank<-rank(sales)
sorted<-sort(sales)
ordered<-order(sales)

25: Dataframe

Syntax:- Vector name <- data.frame(vector1,vector2,vector3,..)
Make a dataframe using the four vectors:
E.g.
view <-Data.frame(Sales,Ranks,Sorted,Ordered)


26: Using sprintf

Syntax: sprintf(“ %d,vector1,operation on vector”)
This function assembles a string from the parts in a formatted manner.
E.g.
i <- 2
s <- sprintf(“the cube of %d is %d”,i,i^3)
s

27: Character Strings

In R, character strings are also defined in double quotations:
a<-“hello”
b<-“55”
Numbers can be characters (as in b, above), but characters cannot be numbers.
To amalgamate strings into vectors of character information:
c(a,b)

28: length VS nchar

Syntax: –

length(vector name)

nchar(vector name)

One of the confusing things about character strings is the distinction between the length of a character object (a vector) and the numbers of characters in the strings comprising that object.
Let’s see an example to make the distinction clear:
sports<- c(“badminton”,“tabletennis”,“cricket”,”basetball”)
Here, sports is a vector comprising of 4 character strings:
length(sports)
The individual character strings have 9,11,7 and 9 characters, respectively:
nchar(sports)

29: regexpr

Syntax: – regexpr(“ specific character”,vetor name)
regexpr() function reports the character position in the provided string(s) where the start of the match with pattern occurs. The function also returns the length of the match.
E.g.
x=c(”apple”,”grape”,”banana”)
r=regexpr(“p”,x)
r

30: gregexpr

Syntax:- gregexpr(“ specific character”,vector name)
gregexpr(pattern,text) is the same as regexpr(), but it finds all the instances of pattern. Here’s an example:
E.g.
gregexpr(“ss”,“Assessed”)

31: regmatches

Syntax: – regmatches(vector1,vector2 with regexpr stored)
regmatches() function can retrieve the matching components of a string vector for a provided match object produced by regxpr().
E.g.
x=c(”apple”,”grape”,”banana”)
r=regexpr(“a”,x)
regmatches(x,r)

32: Using sub and gsub

Syntax: –
Sub (“specific character to find only 1st value”,” specific character to replace with only 1st value”, vector name)
gsub(“spcific character to find”, ”specific character to replace with”, vector name)
sub() function finds patterns within strings in a manner similar to that of grep(), but then substitutes the first instance of a match with a specified string.
E.g.
x=c(”apple”,”banana”,”grape”)
sub(“a”,”$”,x)

gsub() function works in exactly the same manner as sub(), but replaces all matches to pattern rather than replacing only the first match.
E.g.
x=c(”apple”,”banana”,”grape”)
gsub(“a”,”$”,x)

33: strsplit

Syntax: – vector name=strsplit(vector1,split=”split by character”,fixed=TRUE)
This function is used to splits a string into a list containing multiple strings, based on a defined delimiter.
The delimiter can be defined as a fixed character string or as a regular expression.
E.g.
word=”apple|lime|orange”
v=strsplit(word,split=”|”,fixed=TRUE)
v

34: grep Function (Pattern matching)

Syntax: – grep(“special character”,vector name)
This function is used to search for matches of pattern within each element of a character vector and returns an integer vector of the elements of the vector that matches (if value is set to FALSE, which is the default).
If value is set to TRUE, the contents of the matching elements of the character vector are returned.
x = c(“apple”,“potato”,”grape”,”10″,”blue.flower”)
grep(“a”,X)
grep(“a”,X,value=TRUE)
grep(“[[:digit:]]”,X,value=TRUE)

Hope you do practice the commands and use it in data analysis for everyday life.
You will get to see the other data types present in R in our next blog.
For more BigData trending blogs subscribe to us at ACADGILD.

Tags

prateek

An alumnus of the NIE-Institute Of Technology, Mysore, Prateek is an ardent Data Science enthusiast. He has been working at Acadgild as a Data Engineer for the past 3 years. He is a Subject-matter expert in the field of Big Data, Hadoop ecosystem, and Spark.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close