Time series analysis in R allows one to analyze how a variable changes over time. The ts() function is used to create time series objects by specifying the data vector, start and end dates, and frequency. Common applications include sales analysis, inventory analysis, and analyzing trends in variables like COVID-19 cases over time. Multivariate time series can also be created to analyze multiple related time series in a single plot.
Time series inR
Time Series in R is used to see how an object behaves
over a period of time. In R, it can be easily done
by ts() function with some parameters.
Time series takes the data vector and each data is
connected with timestamp value as given by the user.
This function is mostly used to learn and forecast the
behavior of an asset in business for a period of time.
For example, sales analysis of a company, inventory
analysis, price analysis of a particular stock or market,
population analysis, etc.
2.
• Syntax: objectName<- ts(data, start, end, frequency)
• where,
• data represents the data vector
• start represents the first observation in time series
• end represents the last observation in time series
• frequency represents number of observations per unit
time. For example, frequency=1 for monthly data.
3.
• x <-c(580, 7813, 28266, 59287, 75700,
• 87820, 95314, 126214, 218843, 471497,
• 936851, 1508725, 2072113)
•
• # library required for decimal_date() function
• library(lubridate)
•
• # output to be created as png file
• png(file ="timeSeries.png")
•
• # creating time series object
• # from date 22 January, 2020
• mts <- ts(x, start = decimal_date(ymd("2020-01-22")), frequency = 365.25 / 7)
•
• # plotting the graph
• plot(mts, xlab ="Weekly Data",
• ylab ="Total Positive Cases",
• main ="COVID-19 Pandemic",
• col.main ="darkgreen")
•
• # saving the file
• dev.off()
•
4.
Multivariate Time Series
•Multivariate Time Series is creating multiple time series in a single
chart.
• Example: Taking data of total positive cases and total deaths from
COVID-19 weekly from 22 January 2020 to 15 April 2020 in data vector.
• # Weekly data of COVID-19 positive cases and
• # weekly deaths from 22 January, 2020 to
• # 15 April, 2020
• positiveCases <- c(580, 7813, 28266, 59287,
• 75700, 87820, 95314, 126214,
• 218843, 471497, 936851,
• 1508725, 2072113)
•
• deaths <- c(17, 270, 565, 1261, 2126, 2800,
• 3285, 4628, 8951, 21283, 47210,
• 88480, 138475)
5.
• # libraryrequired for decimal_date() function
• library(lubridate)
•
• # output to be created as png file
• png(file ="multivariateTimeSeries.png")
•
• # creating multivariate time series object
• # from date 22 January, 2020
• mts <- ts(cbind(positiveCases, deaths),
• start = decimal_date(ymd("2020-01-22")),
• frequency = 365.25 / 7) (column Bind to merge two data frames)
•
• # plotting the graph
• plot(mts, xlab ="Weekly Data",
• main ="COVID-19 Cases",
• col.main ="darkgreen")
•
• # saving the file
• dev.off()
6.
Data Visualization inR
• Data visualization is the technique used to deliver
insights in data using visual cues such as graphs,
charts, maps, and many others.
• This is useful as it helps in intuitive and easy
understanding of the large quantities of data and
thereby make better decisions regarding it.
• R is a language that is designed for statistical
computing, graphical data analysis, and scientific
research.
• It is usually preferred for data visualization as it offers
flexibility and minimum required coding through its
packages.
7.
Types of DataVisualizations
• Some of the various types of visualizations offered by R
are:
• Bar Plot
• There are two types of bar plots- horizontal and
vertical which represent data points as horizontal or
vertical bars of certain lengths proportional to the
value of the data item. They are generally used for
continuous and categorical variable plotting. By setting
the horiz parameter to true and false, we can get
horizontal and vertical bar plots respectively.
•
8.
• barplot(airquality$Ozone,
• main= 'Ozone Concenteration in air',
• xlab = 'ozone levels', horiz = TRUE)
• Or
• barplot(airquality$Ozone, main = 'Ozone Concenteration
in air', xlab = 'ozone levels', col ='blue', horiz = FALSE)
• Bar plots are used for the following scenarios:
• To perform a comparative study between the various
data categories in the data set.
• To analyze the change of a variable over time in months
or years.
•
9.
Histogram
• A histogramis like a bar chart as it uses bars of
varying height to represent data distribution.
However, in a histogram values are grouped
into consecutive intervals called bins. In a
Histogram, continuous values are grouped and
displayed in these bins whose size can be
varied.
10.
Example:
•
• data(airquality)
•
• hist(airquality$Temp,main ="La Guardia Airport's
• Maximum Temperature(Daily)",
• xlab ="Temperature(Fahrenheit)",
• xlim = c(50, 125), col ="yellow",
• freq = TRUE)
• Histograms are used in the following scenarios:
• To verify an equal and symmetric distribution of the data.
• To identify deviations from expected values.
•
11.
Box Plot
• Thestatistical summary of the given data is presented
graphically using a boxplot. A boxplot depicts information like
the minimum and maximum data point, the median value, first
and third quartile, and interquartile range.
• data(airquality)
•
• boxplot(airquality$Wind, main = "Average wind speed
• at La Guardia Airport",
• xlab = "Miles per hour", ylab = "Wind",
• col = "orange", border = "brown",
• horizontal = TRUE, notch = TRUE)
• Box Plots are used for:
• To give a comprehensive statistical description of the data
through a visual cue.
• To identify the outlier points that do not lie in the inter-quartile
range of data.
12.
Scatter Plot
• Ascatter plot is composed of many points on a Cartesian
plane. Each point denotes the value taken by two parameters
and helps us easily identify the relationship between them.
•
• data(airquality)
•
• plot(airquality$Ozone, airquality$Month,
• main ="Scatterplot Example",
• xlab ="Ozone Concentration in parts per billion",
• ylab =" Month of observation ", pch = 19)
•
• Scatter Plots are used in the following scenarios:
• To show whether an association exists between bivariate data.
• To measure the strength and direction of such a relationship.
13.
Heat Map
• Heatmapis defined as a graphical representation of data using
colors to visualize the value of the matrix. heatmap() function
is used to plot heatmap.
• Syntax: heatmap(data)
• Parameters: data: It represent matrix data, such as values of
rows and columns
• Return: This function draws a heatmap.
•
• data <- matrix(rnorm(50, 0, 5), nrow = 5, ncol = 5)
•
• # Column names
• colnames(data) <- paste0("col", 1:5)
• rownames(data) <- paste0("row", 1:5)
•
• # Draw a heatmap
• heatmap(data)
14.
Scan()
• Reading timeseries of data can be in two
types
• Scan() and ts()
• The scan function reads data into a vector or
list from a file or the R console.
• data <- data.frame(x1 = c(4, 4, 1, 9),
data.frame x2 = c(1, 8, 4, 0), x3 = c(5, 3, 5, 6))
• write.table(data file = "data.txt", row.names =
FALSE)
15.
TS()
• This functioncan be used to store time series
data and creates the time series object.
• Ts(data, start,end, frequency) are the
parameters.
16.
Plotting time seriesdata
• Often you may want to plot a time series in R
to visualize how the values of the time series
are changing over time.
• Suppose we have the following dataset in R: