Introduction to R Programming
A Session
By
Vaibhav Kumar
Dept. of CSE
DIT University, Dehradun
Vaibhav Kumar, DIT University, Dehradun
R
• R is a programming language and software environment for statistical
analysis, graphics representation and reporting.
• R was created by Ross Ihaka and Robert Gentleman at the University
of Auckland, New Zealand.
• R is freely a
• It was named R, based on the first letter of first name of the two R
authors (Robert Gentleman and Ross Ihaka).
Vaibhav Kumar, DIT University, Dehradun
Features of R
• R is a well-developed, simple and effective programming language
which includes conditionals, loops, user defined recursive functions
and input and output facilities.
• R has an effective data handling and storage facility.
• R provides a suite of operators for calculations on arrays, lists, vectors
and matrices.
• R provides a large, coherent and integrated collection of tools for data
analysis.
• R provides graphical facilities for data analysis and display either
directly at the computer or printing at the papers.
Vaibhav Kumar, DIT University, Dehradun
A Simple Example
• A simple program to write “Hello” cab be written in R as:
>print(“Hello”)
• To add two numbers, a program can be written as:
>Print(2+3)
The first program can also be written as:
>message=“Hello”
>print(message)
Vaibhav Kumar, DIT University, Dehradun
Data Types and Objects in R
• While using any programming language, we must define the data type
of variables; means which type of data the variable will store.
• Some popularly used data types in R are: Logical, Numeric, Integer,
Complex, Character, Raw.
• Some frequently used objects in R are: Vectors, Lists, Matrices, Arrays,
Factors, Data Frames.
Vaibhav Kumar, DIT University, Dehradun
Vectors
• A function c() is used to combine the elements of a vectore
Example:
fruits=c(“Apple”, “Orange”, “Banana”)
print(fruits)
• When we execute the above code, we will get the following output:
“Apple” “Orange” “Banana”
Vaibhav Kumar, DIT University, Dehradun
Lists
• A list is an R-object which can contain many different types of elements
inside it like vectors, functions and even another list inside it.
Example
list1=list(c(“Apple”, “Orange”, “Banana”), c(2, 3, 5), 14.5)
print(list1)
When we execute the above code, we will get the following output:
[1] “Apple” “Orange” “Banana”
[2] 2 3 5
[3] 14.5
Vaibhav Kumar, DIT University, Dehradun
Matrices
• A matrix in R can be created using a vector input to the matrix
function.
Example:
M=matrix(c(1, 2,3,4,5,6,7,8,9),ncol=3,nrow=3)
When we execute the above code, we will get the following output:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Vaibhav Kumar, DIT University, Dehradun
Data Frames
• Data frames are tabular data objects.
• Unlike a matrix in data frame each column can contain different modes of data.
• Data Frames are created using the data.frame() function.
Example:
>BMI=data.frame(
Name=c(“Vaibhav”, “Nitin”, “Aakash”),
Height=c(170, 169,175),
Weight=c(80, 75,78),
Age=c(30,30,29))
>print(BMI)
When we run the above code, we will get the following output:
Name Height Weight Age
1 Vaibhav 170 80 30
2 Nitin 169 75 30
3 Aakash 175 78 29
Vaibhav Kumar, DIT University, Dehradun
R-Excel File
• Microsoft Excel is the most widely used spreadsheet program which
stores data in the .xls or .xlsx format.
• R can read directly from these files using some excel specific
packages.
• We will have to run the following codes to install the package in R to
access excel files.
install.packages(“xlsx”)
library(“xlsx”)
(Note: Java environment must be installed before running these codes)
Vaibhav Kumar, DIT University, Dehradun
Reading the Excel File
• Let we have an excel file: marks.xlsx in the current working directory*, then
we will have to run the following code to read this file:
data=read.xlsx(“marks.xlsx”, sheetIndex=1)
print(data)
• To make a sub data frame from the main data frame, we can run the
following code
NameMarks=data.frame(data$Name, data$Final)
When we execute the above code, we can see the data of entire file which is
loaded into the data frame: data
(*.we can see the current working directory through the function getwd())
Vaibhav Kumar, DIT University, Dehradun
Statistical Operations in R
• Let us consider a vector of elements as:
values=c(4, 5, 8, 9, 2, 5, 3, 6, 9, 8, 1 ,4)
• Mean: mean(values)
• Mode: mode(values)
• Median: Median(values)
• Let us consider the previous example of marks, if we want to see the
Mean, Mode or Median of Final marks of students then we will have
to run mean(data$Final), median(data$Final).
Vaibhav Kumar, DIT University, Dehradun
Regression Analysis
• Regression analysis is a very widely used statistical tool to establish a
relationship model between two variables-predictor and response.
• The general mathematical equation for a linear regression is −
y=ax+b
Where y is the response variable, x is the predictor variable and a and b
are the constants known as coefficients of regression.
• In R, lm() function is used to create a relationship model between
these two variables.
Vaibhav Kumar, DIT University, Dehradun
Example of Regression Analysis
• Let us the example of marks of students.
• Suppose we are to analyze the relation between class test marks and final
marks of the students.
• Let y=data$Final, x=data$ClassTest
Then the relation can be created through the code:
relation=lm(y~x)
We can see the relation by running the following code:
print(relation)
• Summary of the relation can be seen through: summary(relation)
(Note: since we are working on very less amount of data, values may not be
acceptable)
Vaibhav Kumar, DIT University, Dehradun
Graphical Visualization of Regression
• Regression analysis in previous example can be visualized graphically as:
>png(file=“MarksRegression.png”)
>plot(x, y, col=“Blue”, main=“Class Test and Final Marks”,
abline(lm(y~x)), cex=1.3, pch=16, xlab=“Class Test”, ylab=“Final
Marks”)
>dev.off()
By running the above code, we can see a regression line of the relation
between class test and final marks.
Vaibhav Kumar, DIT University, Dehradun
Prediction
• By using the regression analysis, we can predict the value of response variable for
a new predictor value through predict() function.
• Consider the previous example, where if we need to predict the final marks of a
student on the basis of his marks in class test.
Let we are to predict final marks if marks in class test is 10.
>a=data.frame(x=10)
>result=predict(relation, a)
>print(relation)
(Note: result will be in highly acceptable range if we have a large data set to create
the model)
Vaibhav Kumar, DIT University, Dehradun
Multiple Regression
• Multiple regression is an extension of linear regression into
relationship between more than two variables.
• In simple linear relation we have one predictor and one response
variable, but in multiple regression we have more than one predictor
variable and one response variable.
• It can be expressed as:
Y=a+b1X1+b2X2+….+bnXn
Where, Y is the response variable, a, b1, b2,…,bn are the coefficients
and X1, X2,….,Xn are the predictor variables.
Vaibhav Kumar, DIT University, Dehradun
Multiple Regression in R
• Let us consider an example where result of students consists of Mid-Term Exams,
Class Tests, Quiz and Final Marks.
• Let we are to create a relation to analyze how Final marks are depending on Mid-
Term Exams, Class Tests and Quiz.
Let we have an another data set NewData which consists all these marks. Then a
relation can be created as:
Mul_Regr=lm(NewData$Final~NewData$MidTerm+NewData$Cla
ssTest+NewData$Quiz, data=NewData)
We can see this relation by
print(Mul_Regr)
Vaibhav Kumar, DIT University, Dehradun
Pie Chart
• In R the pie chart is created using the pie() function.
• Example:
x=c(20, 10, 40, 30)
labels=c(“Dehradun”, “Roorkee”, “Delhi”, “Ghaziabad”)
png(file=“PieChart.png”)
pie(x,labels)
dev.off()
Vaibhav Kumar, DIT University, Dehradun
Bar Chart
• Consider the final marks of students. It can be plotted through bar
chart as:
png(file=“BarChart.png”)
barplot(data$Final)
dev.off()
Vaibhav Kumar, DIT University, Dehradun
Histogram
• Consider the example of marks again. Let we are to plot the
histogram of final marks.
>png(file=“Histogram.png”)
>hist(data$Final, xlab=“Final Marks”, col=“Blue”,
border=“Red”)
>dev.off()
Vaibhav Kumar, DIT University, Dehradun
Thank You
Vaibhav Kumar, DIT University, Dehradun

R programming

  • 1.
    Introduction to RProgramming A Session By Vaibhav Kumar Dept. of CSE DIT University, Dehradun Vaibhav Kumar, DIT University, Dehradun
  • 2.
    R • R isa programming language and software environment for statistical analysis, graphics representation and reporting. • R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. • R is freely a • It was named R, based on the first letter of first name of the two R authors (Robert Gentleman and Ross Ihaka). Vaibhav Kumar, DIT University, Dehradun
  • 3.
    Features of R •R is a well-developed, simple and effective programming language which includes conditionals, loops, user defined recursive functions and input and output facilities. • R has an effective data handling and storage facility. • R provides a suite of operators for calculations on arrays, lists, vectors and matrices. • R provides a large, coherent and integrated collection of tools for data analysis. • R provides graphical facilities for data analysis and display either directly at the computer or printing at the papers. Vaibhav Kumar, DIT University, Dehradun
  • 4.
    A Simple Example •A simple program to write “Hello” cab be written in R as: >print(“Hello”) • To add two numbers, a program can be written as: >Print(2+3) The first program can also be written as: >message=“Hello” >print(message) Vaibhav Kumar, DIT University, Dehradun
  • 5.
    Data Types andObjects in R • While using any programming language, we must define the data type of variables; means which type of data the variable will store. • Some popularly used data types in R are: Logical, Numeric, Integer, Complex, Character, Raw. • Some frequently used objects in R are: Vectors, Lists, Matrices, Arrays, Factors, Data Frames. Vaibhav Kumar, DIT University, Dehradun
  • 6.
    Vectors • A functionc() is used to combine the elements of a vectore Example: fruits=c(“Apple”, “Orange”, “Banana”) print(fruits) • When we execute the above code, we will get the following output: “Apple” “Orange” “Banana” Vaibhav Kumar, DIT University, Dehradun
  • 7.
    Lists • A listis an R-object which can contain many different types of elements inside it like vectors, functions and even another list inside it. Example list1=list(c(“Apple”, “Orange”, “Banana”), c(2, 3, 5), 14.5) print(list1) When we execute the above code, we will get the following output: [1] “Apple” “Orange” “Banana” [2] 2 3 5 [3] 14.5 Vaibhav Kumar, DIT University, Dehradun
  • 8.
    Matrices • A matrixin R can be created using a vector input to the matrix function. Example: M=matrix(c(1, 2,3,4,5,6,7,8,9),ncol=3,nrow=3) When we execute the above code, we will get the following output: [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 Vaibhav Kumar, DIT University, Dehradun
  • 9.
    Data Frames • Dataframes are tabular data objects. • Unlike a matrix in data frame each column can contain different modes of data. • Data Frames are created using the data.frame() function. Example: >BMI=data.frame( Name=c(“Vaibhav”, “Nitin”, “Aakash”), Height=c(170, 169,175), Weight=c(80, 75,78), Age=c(30,30,29)) >print(BMI) When we run the above code, we will get the following output: Name Height Weight Age 1 Vaibhav 170 80 30 2 Nitin 169 75 30 3 Aakash 175 78 29 Vaibhav Kumar, DIT University, Dehradun
  • 10.
    R-Excel File • MicrosoftExcel is the most widely used spreadsheet program which stores data in the .xls or .xlsx format. • R can read directly from these files using some excel specific packages. • We will have to run the following codes to install the package in R to access excel files. install.packages(“xlsx”) library(“xlsx”) (Note: Java environment must be installed before running these codes) Vaibhav Kumar, DIT University, Dehradun
  • 11.
    Reading the ExcelFile • Let we have an excel file: marks.xlsx in the current working directory*, then we will have to run the following code to read this file: data=read.xlsx(“marks.xlsx”, sheetIndex=1) print(data) • To make a sub data frame from the main data frame, we can run the following code NameMarks=data.frame(data$Name, data$Final) When we execute the above code, we can see the data of entire file which is loaded into the data frame: data (*.we can see the current working directory through the function getwd()) Vaibhav Kumar, DIT University, Dehradun
  • 12.
    Statistical Operations inR • Let us consider a vector of elements as: values=c(4, 5, 8, 9, 2, 5, 3, 6, 9, 8, 1 ,4) • Mean: mean(values) • Mode: mode(values) • Median: Median(values) • Let us consider the previous example of marks, if we want to see the Mean, Mode or Median of Final marks of students then we will have to run mean(data$Final), median(data$Final). Vaibhav Kumar, DIT University, Dehradun
  • 13.
    Regression Analysis • Regressionanalysis is a very widely used statistical tool to establish a relationship model between two variables-predictor and response. • The general mathematical equation for a linear regression is − y=ax+b Where y is the response variable, x is the predictor variable and a and b are the constants known as coefficients of regression. • In R, lm() function is used to create a relationship model between these two variables. Vaibhav Kumar, DIT University, Dehradun
  • 14.
    Example of RegressionAnalysis • Let us the example of marks of students. • Suppose we are to analyze the relation between class test marks and final marks of the students. • Let y=data$Final, x=data$ClassTest Then the relation can be created through the code: relation=lm(y~x) We can see the relation by running the following code: print(relation) • Summary of the relation can be seen through: summary(relation) (Note: since we are working on very less amount of data, values may not be acceptable) Vaibhav Kumar, DIT University, Dehradun
  • 15.
    Graphical Visualization ofRegression • Regression analysis in previous example can be visualized graphically as: >png(file=“MarksRegression.png”) >plot(x, y, col=“Blue”, main=“Class Test and Final Marks”, abline(lm(y~x)), cex=1.3, pch=16, xlab=“Class Test”, ylab=“Final Marks”) >dev.off() By running the above code, we can see a regression line of the relation between class test and final marks. Vaibhav Kumar, DIT University, Dehradun
  • 16.
    Prediction • By usingthe regression analysis, we can predict the value of response variable for a new predictor value through predict() function. • Consider the previous example, where if we need to predict the final marks of a student on the basis of his marks in class test. Let we are to predict final marks if marks in class test is 10. >a=data.frame(x=10) >result=predict(relation, a) >print(relation) (Note: result will be in highly acceptable range if we have a large data set to create the model) Vaibhav Kumar, DIT University, Dehradun
  • 17.
    Multiple Regression • Multipleregression is an extension of linear regression into relationship between more than two variables. • In simple linear relation we have one predictor and one response variable, but in multiple regression we have more than one predictor variable and one response variable. • It can be expressed as: Y=a+b1X1+b2X2+….+bnXn Where, Y is the response variable, a, b1, b2,…,bn are the coefficients and X1, X2,….,Xn are the predictor variables. Vaibhav Kumar, DIT University, Dehradun
  • 18.
    Multiple Regression inR • Let us consider an example where result of students consists of Mid-Term Exams, Class Tests, Quiz and Final Marks. • Let we are to create a relation to analyze how Final marks are depending on Mid- Term Exams, Class Tests and Quiz. Let we have an another data set NewData which consists all these marks. Then a relation can be created as: Mul_Regr=lm(NewData$Final~NewData$MidTerm+NewData$Cla ssTest+NewData$Quiz, data=NewData) We can see this relation by print(Mul_Regr) Vaibhav Kumar, DIT University, Dehradun
  • 19.
    Pie Chart • InR the pie chart is created using the pie() function. • Example: x=c(20, 10, 40, 30) labels=c(“Dehradun”, “Roorkee”, “Delhi”, “Ghaziabad”) png(file=“PieChart.png”) pie(x,labels) dev.off() Vaibhav Kumar, DIT University, Dehradun
  • 20.
    Bar Chart • Considerthe final marks of students. It can be plotted through bar chart as: png(file=“BarChart.png”) barplot(data$Final) dev.off() Vaibhav Kumar, DIT University, Dehradun
  • 21.
    Histogram • Consider theexample of marks again. Let we are to plot the histogram of final marks. >png(file=“Histogram.png”) >hist(data$Final, xlab=“Final Marks”, col=“Blue”, border=“Red”) >dev.off() Vaibhav Kumar, DIT University, Dehradun
  • 22.
    Thank You Vaibhav Kumar,DIT University, Dehradun