What is R?
R is interpreted computer programming language
R is a programming language which is highly used in statistics,
graphical representation, reporting and data modeling.
R is becoming more and more popular due to two major reasons:
R is open source.
R has most of the latest statistical methods.
R has a base language that allows a user to program almost
anything they like.
General Introduction toComputing
In a general way, we can define computing to mean any goal-
oriented activity requiring, benefiting from, or creating computers.
Data Types inR Programming
Language
numeric – (3,6.7,121)
Integer – (2L, 42L; where ‘L’ declares this as an integer)
logical – (‘True’)
complex – (7 + 5i; where ‘i’ is imaginary number)
character – (“a”, “B”, “c is third”, “69”)
raw – ( as.raw(55); raw creates a raw vector of the specified length)
12.
Numeric Data Type
Numericdata type is the most frequently used data type in R. It is the default data type whenever you declare a
variable with numbers.
You can store any type of number (with or without decimal) in a variable with numeric data type. For example,
Here, both the my_decimal and my_number variables are of numeric type.
13.
Decimal valuesare called numeric in R. It is the default R data type for numbers
in R.
If you assign a decimal value to a variable x as follows, x will be of numeric type.
Real numbers with a decimal point are represented using this data type in R. It
uses a format for double-precision floating-point numbers to represent numerical
values.
14.
Even ifan integer is assigned to a variable y, it is still saved as a
numeric value.
15.
When Rstores a number in a variable, it converts the number into a
“double” value or a decimal type with at least two decimal places.
This means that a value such as “5” here, is stored as 5.00 with a
type of double and a class of numeric. And also y is not an integer
here can be confirmed with the is.integer() function.
16.
Integer Data Type
Integers are a type of numeric data that can take values without decimal. It's mostly
used when you are sure that the variable can not have any decimal values in the
future.
Here, the variable my_integer contains the value 123L. The suffix L at the end of the value indicates that my_integer is of integer type.
17.
R supportsinteger data types which are the set of all integers.
You can create as well as convert a value into an integer type using
the as.integer() function.
You can also use the capital ‘L’ notation as a suffix to denote that a particular
value is of the integer R data type.
18.
Logical Data typein R
R has logical data types that take either a value of true or false.
A logical value is often created via a comparison between variables.
Boolean values, which have two possible values, are represented by this R data
type: FALSE or TRUE
19.
Complex Data Type
InR, variables with complex data types contain values with an imaginary part. This can be
indicated by using the i as a suffix. For example,
Here, the variables z1 and z2 have been declared as complex data types with an imaginary part denoted by the suffix i.
21.
Character Data typein R
R supports character data types where you have all the alphabets and special
characters.
It stores character values or strings. Strings in R can contain alphabets, numbers,
and symbols.
The easiest way to denote that a value is of character type in R data type is to
wrap the value inside single or double inverted commas.
22.
Raw data typein R
To save and work with data at the byte level in R, use the raw data
type. By displaying a series of unprocessed bytes, it enables low-
level operations on binary data. Here are some speculative data on
R’s raw data types:
23.
Operators in R
Arithmetic Operators
Logical Operators
Relational Operators
Assignment Operators
Miscellaneous Operators
24.
Arithmetic Operators
Additionoperator (+)
The values at the corresponding positions of both operands are
added. Consider the following R operator snippet to add two vectors:
Subtraction Operator (-)
The second operand values are subtracted from the first. Consider the
following R operator snippet to subtract two variables:
25.
Arithmetic Operators
MultiplicationOperator (*)
The multiplication of corresponding elements of vectors and Integers are multiplied
with the use of the ‘*’ operator.
Division Operator (/)
The first operand is divided by the second operand with the use of the ‘/’ operator.
26.
Arithmetic Operators
PowerOperator (^)
The first operand is raised to the power of the second operand.
Modulo Operator (%%)
The remainder of the first operand divided by the second operand is returned.
Logical Operators
Element-wiseLogical AND operator (&)
Returns True if both the operands are True.
Element-wise Logical OR operator (|)
Returns True if either of the operands is True.
29.
Logical Operators
NOToperator (!)
A unary operator that negates the status of the elements of the
operand.
Logical AND operator (&&)
Returns True if both the first elements of the operands are True.
30.
Logical Operators
LogicalOR operator (||)
Returns True if either of the first elements of the operands is True.
31.
Logical Operators inR
The Logical operators in R programming are used to combine two or
more conditions, and perform the logical operations using &
(Logical AND), | (Logical OR) and ! (Logical NOT).
The Comparison Operators are used to compare two variables, and
what if we want to compare more than one condition? Very simple,
R logical operators do the trick for you.
Basic Logical Operatorsin R
example
This example helps you understand how the logical operators in R
Programming used in If statements.
For this logical operators example, we assigned one integer
variable. Then, inside the If Statement, we are using basic logical
operators such as &&, ||, and !
36.
From the screenshotbelow, you can observe that we entered age = 16.
It means age is not greater than 18, so the First statement printed.
37.
Let us seewhat happens when we change the values. From the
screenshot below, see that we have entered age = 29. It means age is
between 18 and 35, so the Second statement is printed
38.
From the screenshotbelow, observe that we have entered age =
45. It means age is between 36 and 60, so the third statement is
printed
R Logical Operatorsexample
This example helps you
understand how each R
logical operator work.
Remember, any positive
integer value greater than
zero considered as Boolean
TRUE, and 0 considered as
Boolean False.
41.
In these logicaloperators in r example, first, we declared two vectors
The below statement compare each vector element and find the logical relation.
The following statement compares the first element of the num1 vector and the
first element of the num2 vector. It means, TRUE && FALSE = FALSE.
42.
Relational Operators
The RelationalOperators in R carry out comparison operations between the
corresponding elements of the operands. Returns a boolean TRUE value if the first
operand satisfies the relation compared to the second. A TRUE value is always
considered to be greater than the FALSE.
Less than (<)
Returns TRUE if the corresponding element of the first operand is less than that of the
second operand. Else returns FALSE.
43.
Relational Operators
Lessthan equal to (<=)
Returns TRUE if the corresponding element of the first operand is less than or equal
to that of the second operand. Else returns FALSE.
Greater than (>)
Returns TRUE if the corresponding element of the first operand is greater than that of
the second operand. Else returns FALSE.
44.
Relational Operators
Greaterthan equal to (>=)
Returns TRUE if the corresponding element of the first operand is greater or equal to
that of the second operand. Else returns FALSE.
Not equal to (!=)
Returns TRUE if the corresponding element of the first operand is not equal to the
second operand. Else returns FALSE.
Assignment Operators
AssignmentOperators in R are used to assigning values to various
data objects in R. The objects may be integers, vectors, or functions.
These values are then stored by the assigned variable names. There
are two kinds of assignment operators: Left and Right
Left Assignment (<- or <<- or =)
Assigns a value to a vector.
Miscellaneous Operators
Miscellaneous Operatorare the mixed operators in R that simulate the
printing of sequences and assignment of vectors, either left or right-
handed.
%in% Operator
Checks if an element belongs to a list and returns a boolean value
TRUE if the value is present else FALSE.
49.
%*% Operator
Thisoperator is used to multiply a matrix with its transpose. Transpose of
the matrix is obtained by interchanging the rows to columns and
columns to rows. The number of columns of the first matrix must be
equal to the number of rows of the second matrix. Multiplication of the
matrix A with its transpose, B, produces a square matrix.
Ar cxBc r−>Pr r
∗ ∗ ∗ Ar∗c
xBc
∗r−>Pr∗r
Subsetting a variablein R
Subsetting a variable in R refers to the process of extracting specific
elements from a data structure, such as a vector, matrix, list, or data
frame, based on certain conditions or indices. This allows you to
focus on particular portions of your data rather than working with
the entire structure.
Handling Missing Valuesin R
Programming
As the name indicates, Missing values are those elements that are not known. NA or NaN are
reserved words that indicate a missing value in R Programming language for q arithmetical
operations that are undefined.
R – Handling Missing Values
Missing values are practical in life. For example, some cells in spreadsheets are empty. If an
insensible or impossible arithmetic operation is tried then NAs occur.
Dealing Missing Values in R
Missing Values in R, are handled with the use of some pre-defined functions:
is.na() Function for Finding Missing values:
A logical vector is returned by this function that indicates all the NA values present. It returns a
Boolean value. If NA is present in a vector it returns TRUE else FALSE.
55.
Properties of MissingValues:
For testing objects that are NA use is.na()
For testing objects that are NaN use is.nan()
There are classes under which NA comes. Hence integer class has
integer type NA, the character class has character type NA, etc.
A NaN value is counted in NA but the reverse is not valid.
The creation of a vector with one or multiple NAs is also possible.
56.
Removing NA orNaN values
There are two ways to remove missing values:
Extracting values except for NA or NaN values:
Example 1:
Example 2:
57.
Missing Value FilterFunctions
The modeling functions in R language acknowledge a na.action argument
which provides instructions to the function regarding its response if NA
comes in its way.
And hence this way the function calls one of the missing value filter
functions. Missing Value Filter Functions alter the data set and in the new
data set the value of NAs has been changed. The default Missing Value
Filter Function is na.omit. It omits every row containing even one NA. Some
other Missing Value Filter Functions are:
na.omit– omits every row containing even one NA
na.fail– halts and does not proceed if NA is encountered
na.exclude– excludes every row containing even one NA but keeps a
record of their original position
na.pass– it just ignores NA and passes through it
59.
Find and RemoveNA or NaN
values from a dataset
In R we can remove and find missing values from the entire dataset.
there are some main functions we can use and perform the tasks.
First, we will create one data frame and then we will find and
remove all the missing values which are present in the data.
60.
• Find allthe missing values in the data
• Find all the missing values in the columns
R Vectors
Vectors inR programming are the same as the arrays in C language which are used to hold multiple data values of the same type. One
major key point is that in R the indexing of the vector will start from ‘1’ and not from ‘0’. Vectors are the most basic data types in R.
Even a single object created is also stored in the form of a vector. Vectors are nothing but arrays as defined in other languages. Vectors
contain a sequence of homogeneous types of data. There are mainly two types of vectors in R. The complete classification vectors in R
are given below.
Atomic Vector
Numeric/Double
Integer
Logical
Character
Complex
Raw
Recursive Vector
List
The main difference between atomic vectors and recursive vector(list) is that atomic vectors are homogeneous, whereas the recursive
vector(list) can be heterogeneous. Vectors have three common properties:
Type, typeof(), what it is.
Length, length(), how many elements it contains.
Attributes, attributes(), additional arbitrary metadata.
Atomic Vectors
InR, an atomic vector is the simplest type of data structure that
contains elements of the same data type.
Atomic vectors are one-dimensional and are the building blocks for
more complex data structures in R, like lists and data frames.
Atomic vectors are constructed with the c() function or the vector
function.
Integer Vector
Integervectors are also known as numeric vectors in R. This includes
negative and positive whole values. In R, numbers are double by
default so to make an integer, place L after the number.
69.
Logical Vector
Logicalvectors are the simplest type of atomic vector as they take
only three possible values: FALSE, TRUE, and NA. Logical vectors can
be constructed with comparison operators.we can also create
these using c().
70.
Character Vector
Charactervectors are the most complex type of atomic vector because each
element of a character vector is a string and a string can contain an arbitrary
amount of data.strings in R can contain the alphabets, numbers, and symbols. The
easiest way to denote that a value is of character type in R is to wrap the value
inside single or double inverted commas. We can even use the as.character()
function to store a value as a character or to convert a value to the character data
type.
71.
Complex Vector
Thecomplex data type is to store numbers with an imaginary
component. Examples of complex values are 1+2i, 3i, 4-5i, -12+6i,
etc.
72.
Recursive Vector
Listsare stepped up in complexity from atomic vectors, because the list can
contain other lists. This makes them suitable for representing hierarchical or tree-
like structures. We can create a list with list().
• The list elements can be given names, and they can be accessed using these names. Lists can be
extremely useful inside functions. Because the functions in R are able to return only a single
object, you can “staple” together lots of different kinds of results into a single object that a function
can return. A list does not print to the console like a vector. Instead, each element of the list starts
on a new line. Elements are indexed by double brackets. If the elements of a list are named, they
can be referenced by the $ notation.
Sorting elements ofa R Vector
sort() function is used with the help of which we can sort the values
in ascending or descending order.
78.
R Objects
Everyprogramming language has its own data types to store values
or any information so that the user can assign these data types to
the variables and perform operations respectively. Operations are
performed accordingly to the data types.
These data types can be character, integer, float, long, etc. Based
on the data type, memory/storage is allocated to the variable. For
example, in C language character variables are assigned with 1
byte of memory, integer variable with 2 or 4 bytes of memory and
other data types have different memory allocation for them.
Unlike other programming languages, variables are assigned to
objects rather than data types in R programming.
79.
Type of Objects
Vectors
Atomic vectors are one of the basic types of objects in R
programming. Atomic vectors can store homogeneous data types
such as character, doubles, integers, raw, logical, and complex. A
single element variable is also said to be vector.
80.
Lists
List is anothertype of object in R programming. List can contain
heterogeneous data types such as vectors or another lists.
81.
Matrices
To store valuesas 2-Dimensional array, matrices are used in R. Data, number of rows and columns are defined in
the matrix() function.
Syntax:
82.
The byrow argumentin the matrix() function controls how the elements are filled into the
matrix:
•byrow = FALSE (Default): The matrix is filled by columns, meaning the elements are entered
column by column.
•byrow = TRUE: The matrix is filled by rows, meaning the elements are entered row by row.
83.
dimnames Argument
The dimnamesargument allows you to assign names to the rows and columns of a matrix. It is a list of two components:
1.The first component contains the names of the rows.
2.The second component contains the names of the columns.
84.
Factors
Factor objectencodes a vector of unique elements (levels) from
the given data vector.
85.
Arrays
array() function isused to create n-dimensional array. This function takes dim attribute as an argument and creates required length of each
dimension as specified in the attribute.
Syntax:
array(data, dim = length(data), dimnames = NULL)
86.
dim in Arrays
Thedim attribute specifies the dimensions of an array. It determines the size of each dimension in the array. When you create an array, you define its dimensions using
the dim argument.
87.
dimnames in Arrays
Thedimnames attribute assigns names to each dimension of the array. It is a list where each element corresponds to a dimension of the array.
•The first element of the list contains the names for the rows.
•The second element contains the names for the columns.
•The third element contains the names for the layers (or slices).
88.
Data Frames
Dataframes are 2-dimensional tabular data object in R
programming. Data frames consists of multiple columns and each
column represents a vector. Columns in data frame can have
different modes of data unlike matrices.
89.
Functions
Defining a Function
Afunction in R is defined using the
function() keyword, and it typically
includes:
•A name (optional, but recommended for
reusability)
•A list of arguments (parameters) to pass
inputs to the function
•A body of code enclosed in curly braces {}
that specifies the operations to perform
•A return value (explicitly using the return()
function or implicitly using the last
evaluated expression)
90.
System-defined objects arepredefined by the R programming
environment. These include
1. Basic Data Types:
•Numeric: Used to store numbers. Example: x <- 42
•Character: Used to store text. Example: y <- "Hello"
•Logical: Used to store TRUE or FALSE values. Example: z <-
TRUE
2. Data Structures:
•Vectors: A sequence of data elements of the same basic type. Example: v <- c(1, 2, 3)
•Matrices: Two-dimensional, homogeneous arrays. Example: m <- matrix(1:6, nrow = 2)
•Arrays: Multi-dimensional, homogeneous data structures. Example: a <- array(1:8, dim = c(2, 2, 2))
•Data Frames: Two-dimensional, heterogeneous data structures. Example: df <- data.frame(x = 1:3, y = c("a", "b", "c"))
•Lists: Ordered collections of objects, potentially of different types. Example: lst <- list(num = 1, char = "a", vec = c(1, 2, 3))
3. Functions:
Predefined functions available in R. Example: mean(x) calculates the mean of vector x.
4. Special Values:
•NA: Represents missing values.
•NULL: Represents the absence of any value or object.
•Inf and -Inf: Represent positive and negative infinity.
•NaN: Represents 'Not a Number', often a result of undefined mathematical operations.
91.
R Data Frame– Access Data
A data frame is a two-dimensional data structure which can store data in tabular format.
Data frames have rows and columns and each column can be a different vector. And different vectors
can be of different data types.
Before we learn about Data Frames, make sure you know about R vector.
• In R, we use the data.frame() function to create a Data Frame.
• The syntax of the data.frame() function is
Get Element ofR Data Frame
To extract element from ith row, jth column of an R Data Frame, use
the index notation and pass the row numbers and column numbers
as vectors in square brackets after data frame, as shown in the
following code snippet.
94.
Example
celebrities = data.frame(name= c("Andrew", "Mathew", "Dany", "Philip", "John", "Bing",
"Monica"),
age = c(28, 23, 49, 29, 38, 23, 29),
income = c(25.2, 10.5, 11, 21.9, 44, 11.5, 45))
# get elements from rows(2,5), columns(1,3)
elements = celebrities[c(2,5),c(1,3)]
print(elements)
Example
celebrities = data.frame(name= c("Andrew", "Mathew", "Dany", "Philip", "John", "Bing",
"Monica"),
age = c(28, 23, 49, 29, 38, 23, 29),
income = c(25.2, 10.5, 11, 21.9, 44, 11.5, 45))
# extract columns (age, income)
extractedDF = data.frame(celebrities$age, celebrities$income)
# print to output
print("New data frame with two columns extracted from celebrities : ")
print(extractedDF)
97.
Add row(s) toR Data Frame
To add more rows to an R Data Frame with the rows from other Data
Frame, call rbind() function, and pass the original and other data
frames as arguments to the function.
The syntax to call rbind() function is
resulting_data_frame = rbind(<existing_data_frame_name>,<additional_data_frame_name>)
98.
Example
celebrities = data.frame(name= c("Andrew", "Mathew", "Dany", "Philip", "John", "Bing",
"Monica"),
age = c(28, 23, 49, 29, 38, 23, 29),
income = c(25.2, 10.5, 11, 21.9, 44, 11.5, 45))
new_data = data.frame(name = c("Gary", "Lee", "Scofield"),
age = c(29, 22, 33),
income = c(21, 5, 31))
# add rows of new_data to celebrities
celebrities = rbind(celebrities, new_data)
# print to output
print("Resulting celebrities data frame with three newly added rows : ")
print(celebrities)
99.
Add column(s) toR Data Frame
Following R function is used to add more columns to an R Data
Frame.
resulting_data_frame = cbind(<existing_data_frame_name>,<new_data_frame_name>)