R is an open source programming language and software atmosphere for statistical computing and graphics that is supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software system and data analysis.
In this blog, we will go through getting started with R and the basic commands in R language.
Here, we also discuss the Vector data type in detail. We also present to you a command and output screenshot as the execution proof for the same.
Beginners who do not have R installed, please go to the link below:This will download an .exe file which will execute and install the R language in your Windows system. Also, download and install RStudio on your desktop. This will give an environment to work with R language.
After installing RStudio, you will get a welcome page, as shown in the screenshot below. In this post, we will see that the same Rstudio executes all the basic commands on R.
Create a new R script to open the scripting shell in R-studio. Follow the screenshot below:
This will give you the screen below:
You can now start working on Rstudio. We have attached screenshots with every command for you to practice and refer to for better understanding.
Note: Please do check the commands after copying to Rstudio, as it may give some error due to changes in the ASCII values of the characters.
1: Setwd
Syntax: setwd(“~/mydirectory”)
Change your working directory with the setwd() function. Note that the slashes have to be either forward or double backward slashes. For Windows, the command might look something like:
setwd(“C:/ACD/Documents/RProjects”) setwd(“C:\\ACD\\Documents\\RProjects”)
Screenshot:
2: Install Packages
Syntax: install.packages(“package name”)
You can do pretty much anything in R, using 10000+ packages in R at CRAN or at the Comprehensive R Archive Network.
The command for installing a package is: install.packages(“thepackagename”); e.g., install.packages(“sqldf”)
If you don’t want to type the command by yourself, in RStudio you can see that there’s a “Packages” tab on the lower right of the window, click that and you’ll see a button to “Install Packages.”
3: Updated Packages
Syntax: update.packages()
For updated packages, you can run the following to get the latest versions of all your installed packages:
E.g. update.packages()
To do it all at once:
Another dialogue box will appear. Press “Update.”
4: removepackage
Syntax: remove.packages(“thepackagename”)
To remove a package on your system, type in the following:
remove.packages(“sqldf”)
5: Function Help
Syntax:
help(functionName)
?functionName
This is a shortcut to the help function, which uses parentheses. E.g.: help(sqldf)
If you want to find out more about a function, type: ?functionName;
E.g. ?median
As you see, both give the exact same details.
6: Assignment
Objects obtain values in R by assignment (‘x gets a value’).
This is done either by “<-“ or “=”
Thus, to create a scalar constant x with value 6, we type:
x <- 6 or x = 6
y <- “a” or y = “a”
7: Operators
The operators available in R are listed below. Hope you already understand the use of each of these operators.
8: Modulo and Integer Quotients
Integer Quotients
Syntax: x % / % y
To know the integer part of a division, say, how many 2s are there in 50, type in the following:
50%/%2
Modulo
Syntax: x %% y
To know the remainder (what is left over when 50 is divided by 2):
In math, this is known as modulo.
E.g. 50%%2
9: ScreenPromt
The screen prompt > is an invitation to put R to work.
Each line can have 128 characters.
Two or more expressions can be on a single line, as long as they are separated by semi-colons:
10: In-Built Functions
The log function gives logs to the base e (e = 2718), for which the antilog function is exp.
log(10) [1] 2.302585 exp(1) [1] 2.718282
11: Exponents
Syntax: e
For very big or small numbers, R uses the following scheme:
Scheme Meaning
1.9e3 1900 i.e. 1.9 multiplied by 1000
1.9e-2 0.019 i.e. 1.9 multiplied by 1/100
12: Rounding
These Functions are used to convert the decimals into integers.
There are multiple types of roundings:
rounding up
rounding down
These can be done in R using:
Syntax: floor(Decimal Number)
And the ‘next integer’ function is ceiling.
Syntax: ceiling(Decimal Number):
E.g.:
floor(5.9)
ceiling(2.9)
These are the very basics of R. Any beginner can start with these commands to get familiar with R studio.
To know more about R, we move forward to see Data Structures in R.
We have the following types of Data Structures:
Vector
Matrix
Array
Lists
DataFrames
Classified under 2 branches:
Homogeneous |
Heterogeneous | |
1d | vector | List |
2d | Matrix | Data frame |
Nd | Array |
Out of all, we will cover only “Vector” in detail. Other data types are equally interesting, but as beginners, let us keep it short and simple.
Vectors
A collection of values that all have the same data type. The elements of a vector are all numbers, giving a numeric vector, or all character values, giving a character vector. Also, there is another type of vector that is present, which we know it as a logical vector.
13: Creating a Vector
Vectors are variables with one or more than one values of the same type: logical, integer, real, complex, and string. Vectors could also have a length equal to 0.
14: Length of a Vector
Syntax: length(Vector name)
This will show the length of a vector specified.
E.g. length(a)
The length of the longest vector is assigned to a derived vector (created by calculation), here A is of length 5 and B is of length 2:
15: Types of Vectors
Numeric vector
E.g.
a <- c(4,3,6.3,6,-8,9)
Character vector
E.g.
b <- c(“nine”,”two”,”eight”)
Logical vector
E.g.
c <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE)
Next, we learn how to refer to these elements.
E.g. a[c(1,3)] # refers to the 1^{st} and 3^{rd} element of a vector.
16: How to Work with Vectors & Logical Subscripts?
Take the example of a vector containing the 8 numbers, from 0 to 7:
E.g.
y<-0:7
To add up all the values:
E.g. sum(y)
To know how many of the values were less than 3:
E.g.
sum(y<3)
To find the sum of the values of x that are less than 3, we write:
E.g. sum(y[y<5])
To find out the logical condition x<3 is either true or false:
E.g. y<3
17: Vector Functions in R
Important vector functions are listed in the Vector functions used in R.
All these functions are same as the functions that we use in mathematics. Hope readers understand these basic functions.
18: How to Work with Vectors and Logical Subscripts
To find out the sum of the two largest values in a vector.
First, sort the vector in descending order, then add up the values of the last two elements of the sorted array.
E.g.
Let’s do this in stages. First, the values of a:
a<-c(2,9,9,9,2)
Now if you apply “sort” to this, the numbers will be sorted by ascending sequence,
sort(a)
sum(sort(a)[2:3])
19: Logical Arithmetic
Arithmetic involving TRUE or FALSE can be done in R .
R can coerce TRUE or FALSE into numerical values: 1 for TRUE and 0 for FALSE.
E.g.
a<-0:6
Is a less than 4?
a<4
Any Value greater than 0?
all(a>0)
Any Value greater than -5?
all(a>-5)
Any Value less than 2?
any(a<2)
Sum of values less than 2 in vector a sum(a<2)
We have many other operators which are pretty similar to mathematical operators listed below.
20Logical Operations
List of operators:
21Generating Regular Sequences of Numbers
For regularly spaced sequences, involving integers, it is simplest to use the colon operator. This can produce ascending or descending sequences:
Seq(5,15)
[1] 5 6 7 8 9 10 11 12 13 14 15
E.g.
Use the seq function to go from 0 up to 5 in steps of 0.5:
seq(0,5,0.5)
E.g.
Sequencing downwards from 5 down to 0 in steps of 0.5:
seq(5,0,-.5)
22: Generating Repeated Sequence
The rep() (or repeat) function puts the same constant into long vectors. The call form is rep(x,times).
x <- rep(6,4)
rep(c(11,12,13),3)
rep(1:2,3)
rep(c(11,12,13),each=2)
23: Identifying Missing Values
Syntax:-
mean (vector name, na.rm=T)
is.na(vector name)
Missing values are a cause of concern and can be dealt accordingly
Suppose we have a vector
a<-c(NA,11,12,NA,NA)
To handle the missing values, using the na.rm=TRUE argument
E.g.
mean (a,na.rm=T)
To check for the location of missing values within a vector, use the function is.na(x)
E.g.
which(is.na(a))
To convert the NA to 0, use the ifelse function:
E.g.
ifelse(is.na(a),0,a)
24: Sorting, Ranking & Ordering
For example:
sales<-c(100,50,75,150,200,25)
Now apply the three different functions to the vector called sales
rank<-rank(sales)
sorted<-sort(sales)
25: Dataframe
Syntax:- Vector name <- data.frame(vector1,vector2,vector3,..)
Make a dataframe using the four vectors:
E.g.
view <-Data.frame(Sales,Ranks,Sorted,Ordered)
26: Using sprintf
Syntax: sprintf(“ %d,vector1,operation on vector”)
This function assembles a string from the parts in a formatted manner.
E.g.
i <- 2
s <- sprintf(“the cube of %d is %d”,i,i^3)
s
27: Character Strings
In R, character strings are also defined in double quotations:
a<-“hello”
b<-“55”
Numbers can be characters (as in b, above), but characters cannot be numbers.
To amalgamate strings into vectors of character information:
c(a,b)
28: length VS nchar
Syntax: –
length(vector name)
nchar(vector name)
One of the confusing things about character strings is the distinction between the length of a character object (a vector) and the numbers of characters in the strings comprising that object.
Let’s see an example to make the distinction clear:
sports<- c(“badminton”,“tabletennis”,“cricket”,”basetball”)
Here, sports is a vector comprising of 4 character strings:
length(sports)
The individual character strings have 9,11,7 and 9 characters, respectively:
nchar(sports)
29: regexpr
Syntax: – regexpr(“ specific character”,vetor name)
regexpr() function reports the character position in the provided string(s) where the start of the match with pattern occurs. The function also returns the length of the match.
E.g.
x=c(”apple”,”grape”,”banana”)
r=regexpr(“p”,x)
r
30: gregexpr
Syntax:- gregexpr(“ specific character”,vector name)
gregexpr(pattern,text) is the same as regexpr(), but it finds all the instances of pattern. Here’s an example:
E.g.
gregexpr(“ss”,“Assessed”)
31: regmatches
Syntax: – regmatches(vector1,vector2 with regexpr stored)
regmatches() function can retrieve the matching components of a string vector for a provided match object produced by regxpr().
E.g.
x=c(”apple”,”grape”,”banana”)
r=regexpr(“a”,x)
regmatches(x,r)
32: Using sub and gsub
Syntax: –
Sub (“specific character to find only 1st value”,” specific character to replace with only 1st value”, vector name)
gsub(“spcific character to find”, ”specific character to replace with”, vector name)
sub() function finds patterns within strings in a manner similar to that of grep(), but then substitutes the first instance of a match with a specified string.
E.g.
x=c(”apple”,”banana”,”grape”)
sub(“a”,”$”,x)
gsub() function works in exactly the same manner as sub(), but replaces all matches to pattern rather than replacing only the first match.
E.g.
x=c(”apple”,”banana”,”grape”)
gsub(“a”,”$”,x)
33: strsplit
Syntax: – vector name=strsplit(vector1,split=”split by character”,fixed=TRUE)
This function is used to splits a string into a list containing multiple strings, based on a defined delimiter.
The delimiter can be defined as a fixed character string or as a regular expression.
E.g.
word=”apple|lime|orange”
v=strsplit(word,split=”|”,fixed=TRUE)
v
34: grep Function (Pattern matching)
Syntax: – grep(“special character”,vector name)
This function is used to search for matches of pattern within each element of a character vector and returns an integer vector of the elements of the vector that matches (if value is set to FALSE, which is the default).
If value is set to TRUE, the contents of the matching elements of the character vector are returned.
x = c(“apple”,“potato”,”grape”,”10″,”blue.flower”)
grep(“a”,X)
grep(“a”,X,value=TRUE)
grep(“[[:digit:]]”,X,value=TRUE)
Hope you do practice the commands and use it in data analysis for everyday life.
You will get to see the other data types present in R in our next blog.
For more BigData trending blogs subscribe to us at ACADGILD.