AGENDA
• History and evolution of R
• Principle and software
paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why use R?
• References for learning R
HISTORY AND EVOLUTION OF
R
R has developed from the S language
HISTORY AND EVOLUTION OF R
S Version 1
S Version 2
S Version 3
S Version 4
Developed 30 years ago for
research
applied to the high-tech industry
99 ’s:ϭ Ϭ R developed
concurrently with S
1993: R made public
The regular development of R
HISTORY AND EVOLUTION OF R
Acceleration of R
development
R-Help and R-Devl mailing-lists
Creation of the R Core Group
Source: R Journal Vol
1/2
Growing number of packages
HISTORY AND EVOLUTION OF R
2001: ~100 packages
2009: Over 2000
packages
Source: R Journal Vol
1/2
2000: R version 1.0.1
Today: R version
2.14
Explosion of R popularity in the last decade
HISTORY AND EVOLUTION OF R
 Object-oriented, growing user base, scripting
features
 Free and open-source
 Irrational reasons: R seen as « cool »
Comparison of Mailing Lists
HISTORY AND EVOLUTION OF
R
Popularity amongst programming languages
HISTORY AND EVOLUTION OF
R
Number of Blogs
HISTORY AND EVOLUTION OF R
Data as on Mar
2012
Software Number of Blogs
R 365
SAS 40
Stata 8
Others 0-3
AGENDA
• History and evolution of
R• Principle and software paradigm
• Description of R
interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
 R is rather a programming language
 Limited user-friendly interfaces for data analysis
 Is object oriented and almost non declarative
 Similar to programming languages like Fortran, C, Java,
Python
R is not really a (statistical) software
PRINCIPLE AND SOFTWARE
PARADIGM
Recent endeavours to enhance R user-
friendliness Several GUIs in development
R-commander
RKWard
Rattle
R has limited Graphical User Interface (GUI) options
PRINCIPLE AND SOFTWARE
PARADIGM
R Commander (RCmdr)
PRINCIPLE AND SOFTWARE
PARADIGM
RKWard
PRINCIPLE AND SOFTWARE
PARADIGM
Rattle
PRINCIPLE AND SOFTWARE
PARADIGM
Inherent limitations of pervasive Excel-like spreadsheets
PRINCIPLE AND SOFTWARE
PARADIGM
VS.
Sophisticated but costly SAS
PRINCIPLE AND SOFTWARE
PARADIGM
VS.
AGENDA
• History and evolution of R
• Principle and software
paradigm• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning
R
R console
DESCRIPTION OF R INTERFACE
R desktop
shortcut
RGui: R basic
interface
R command
line (space to
write
instructions)
Using the command line in R console
DESCRIPTION OF R INTERFACE
First false
sentence followed
by R’s error
message
Second correct
sentence
Declaration and
printing of the
sentence as a R
object
Simple math
computations
Basic information
about the R object
containing the
sentence
RGui menu: File tab
DESCRIPTION OF R INTERFACE
File tab: Usual
basic and general
operations
RGui menu: Edit tab
DESCRIPTION OF R INTERFACE
Edit tab: basic
and general
editing
Results of
the
data editor
Data editor:
entering the
oďjeĐt’s
name
RGui menu: View tab
DESCRIPTION OF R INTERFACE
View tab:
viewing Toolbar
and/or Status bar
RGui menu: Misc tab
DESCRIPTION OF R INTERFACE
Misc tab:
diverse
operation
s
RGui menu: Packages tabs
DESCRIPTION OF R INTERFACE
Packages tab:
adding functions
to R foundation
RGui menu: Windows tab
DESCRIPTION OF R INTERFACE
Windows tab:
usual options
to arrange the
tiles
RGui menu: Help tab
DESCRIPTION OF R INTERFACE
Help tab: very
important
links to help
AGENDA
• History and evolution of R
• Principle and software
paradigm
• Description of R interface• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning
R
 Open source code
 You can access the code of the
software
 In-depth understanding of what R does
 Modify the code
ADVANTAGES OF R
Adress of the
« mgcv » package
Link with Package sources (.tar.gz file)
Screenshot of the CRAN webpage of the « mgcv » package. Source: CRAN
R access to source code
ADVANTAGES OF R
Screenshot of unzipping the « mgcv » package and browsing through the package’s
files.
Unzipping
mgcv_1.7-
13.tar.gzfile (with 7zip)
List of
directories in the
« mgcv »
package
List of functions (i.e
open code) in the « src
» (i.e code sources)
directory the « mgcv »
package1 2 3
R is free
ADVANTAGES OF R
Software Academics Demo Commercial
(basic)
Commercial (full)
R Free Free Free Free
SAS Free to $100s Not available $1 000s $10 000s
Statistica $100s 30 days limit ~$1 000 $10 000
Excel
(Microsoft)
Free to $10s Limited ~$100 $100s
SPSS (IBM) $100s 14 days limit ~$2 000 $1 000s
Interface with other languages and scripting capabilities
ADVANTAGES OF R
Interfaces with virtually any other programming language
Fortran, C, C++, Python…
Tailor or rewrite your old codes in R
R as a scripting language
R scripts can launch or be launched by other languages
« mgcv.c » file in the
« mgcv » package coded in typical C programming language
Screenshot of the file « mgcv.c » of the « mgcv » package open in WordPad
R visualization capabilities
ADVANTAGES OF
R
R visualization capabilities
ADVANTAGES OF
R
R visualization capabilities
ADVANTAGES OF
R
 R ~ tool used by the finest
researchers
 Top-notch analytics capabilities
R role in academia
ADVANTAGES OF R
Free open source philosophy
To summarize
ADVANTAGES OF R
 R websites with many
examples
 Free books
 Free online open courses
 Twitter accounts
Online help and discussion
 Mailing-lists
 Very active and diverse forums
 Communities of developers and
helpers
AGENDA
• History and evolution of R
• Principle and software
paradigm
• Description of R interface
• Advantages of R• Drawbacks of R
• So why using R?
• References for learning
R
Poor management of large datasets
Avoid imbricated loops
Prefer R advanced language for data structure
Complicated structure of packages in R
 Dozen of packages
 To be loaded every time in memory
R packages to better manage memory
 Rhadoop (inspiration from Google)
 Ff
 bigmemory
Average memory performance
DRAWBACKS OF R
No default parallel execution
R packages to use several cores
Top skills needed for high performance computing
A high-level programming language
Abstract and modern (Python…Ϳ
More productive coding
But further from « machine language »…
… meaning 100 times slower than C
Average computing performance
DRAWBACKS OF R
Difficult to inspect data
sets
Difficult data visualization and management
DRAWBACKS OF R
Problems for large organizations
R made of several thousands independent packages
No deployment plan for complex organizations
No installation support
Lack of code accountability
Thousands of individual independent R developers
Nobody responsible for the quality of the code
Potentially high hidden costs with R
 Total cost may favour commercial solutions for complex computations made in
large
corporations
Difficult architecture management
DRAWBACKS OF R
Steep learning curve
R code far from undergrad computer science courses
Very complex data structures (useful if mastered)
Is R’s syntax not logical?
Relatively difficult to learn
DRAWBACKS OF R
Still, not more difficult to learn than
SAS


Both SAS and R more abstract than basic programming languages (Fortran, C…
Ϳ Difficult to learn = more rewarding professionally!!
AGENDA
• History and evolution of R
• Principle and software
paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R• So why use R?
• References for learning
R
No language is perfect!!
Contradictory objectives to meet
Strengths and weaknesses of each language
Effect of legacy and the culture of the organization
Use existing solutions (system architecture, BA tools…Ϳ
Habits in business analytics
Different needs imply different tools
Large corporations + defined procedures  SAS-like
Less financial resources + quick proof of concept  R
More positive than negative points
SO WHY LEARN R?
Very appealing solution
SO WHY LEARN R?
Popularity of business analytics software (green = very popular, red = unpopular).
Over
all
Corpor
ate
Consulta
nts
Academ
ics
NGO/Go
v'tR SAS
IBM
SPSS
STATIST
ICA
Own
code
AGENDA
• History and evolution of R
• Principle and software
paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?• References for learning R
Many books available: choose the one that fits you!
Style, pedagogy, theory vs practice
Browse several books at local library or store
Springer’s UseR! Series
(http://www.springer.com/series/6991)
Recent, concise, good quality, affordable, diverse
Pure rookies: « A beginners’ guide to R », « R by example »
One step forward: « Business analytics for managers »
Intensive Excel users: « R through Excel »
O’Reilly R series ;for programmersͿ
« R cookbook », « R in a nuttshell »
Books
REFERENCES FOR LEARNING R
Websites
REFERENCES FOR LEARNING R
R official websites
The R project for statistical computing (www.r-project.org )
Mailing lists (« R-help », Special Interest Groups) and R journal
Official (austere) manuals (« An introduction to R »)
Other websites
UCLA online R resources http://www.ats.ucla.edu/stat/r/)
R blogs aggregator (www.r-bloggers.com)
Social networks: LinkedIn groups (The R project for statistical computing), Twitter accounts
(@RevolutionR, @inside_R), jobboards
Growing number of conferences about R Official International R UseR! conference
Annual during a few days in new venue (Google it!)
Lots of materials about many topics
Other conferences or venues
 Find (or even start!) a R user group close to your location (R Wiki geographical list, map
ofgroups on « meetup.com »)
 Events and news from R-bloggers blog
Conferences
REFERENCES FOR LEARNING R

R programming

  • 2.
    AGENDA • History andevolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why use R? • References for learning R
  • 3.
  • 4.
    R has developedfrom the S language HISTORY AND EVOLUTION OF R S Version 1 S Version 2 S Version 3 S Version 4 Developed 30 years ago for research applied to the high-tech industry
  • 5.
    99 ’s:ϭ ϬR developed concurrently with S 1993: R made public The regular development of R HISTORY AND EVOLUTION OF R Acceleration of R development R-Help and R-Devl mailing-lists Creation of the R Core Group Source: R Journal Vol 1/2
  • 6.
    Growing number ofpackages HISTORY AND EVOLUTION OF R 2001: ~100 packages 2009: Over 2000 packages Source: R Journal Vol 1/2 2000: R version 1.0.1 Today: R version 2.14
  • 7.
    Explosion of Rpopularity in the last decade HISTORY AND EVOLUTION OF R  Object-oriented, growing user base, scripting features  Free and open-source  Irrational reasons: R seen as « cool »
  • 8.
    Comparison of MailingLists HISTORY AND EVOLUTION OF R
  • 9.
    Popularity amongst programminglanguages HISTORY AND EVOLUTION OF R
  • 10.
    Number of Blogs HISTORYAND EVOLUTION OF R Data as on Mar 2012 Software Number of Blogs R 365 SAS 40 Stata 8 Others 0-3
  • 11.
    AGENDA • History andevolution of R• Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
  • 12.
     R israther a programming language  Limited user-friendly interfaces for data analysis  Is object oriented and almost non declarative  Similar to programming languages like Fortran, C, Java, Python R is not really a (statistical) software PRINCIPLE AND SOFTWARE PARADIGM
  • 13.
    Recent endeavours toenhance R user- friendliness Several GUIs in development R-commander RKWard Rattle R has limited Graphical User Interface (GUI) options PRINCIPLE AND SOFTWARE PARADIGM
  • 14.
    R Commander (RCmdr) PRINCIPLEAND SOFTWARE PARADIGM
  • 15.
  • 16.
  • 17.
    Inherent limitations ofpervasive Excel-like spreadsheets PRINCIPLE AND SOFTWARE PARADIGM VS.
  • 18.
    Sophisticated but costlySAS PRINCIPLE AND SOFTWARE PARADIGM VS.
  • 19.
    AGENDA • History andevolution of R • Principle and software paradigm• Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
  • 20.
    R console DESCRIPTION OFR INTERFACE R desktop shortcut RGui: R basic interface R command line (space to write instructions)
  • 21.
    Using the commandline in R console DESCRIPTION OF R INTERFACE First false sentence followed by R’s error message Second correct sentence Declaration and printing of the sentence as a R object Simple math computations Basic information about the R object containing the sentence
  • 22.
    RGui menu: Filetab DESCRIPTION OF R INTERFACE File tab: Usual basic and general operations
  • 23.
    RGui menu: Edittab DESCRIPTION OF R INTERFACE Edit tab: basic and general editing Results of the data editor Data editor: entering the oďjeĐt’s name
  • 24.
    RGui menu: Viewtab DESCRIPTION OF R INTERFACE View tab: viewing Toolbar and/or Status bar
  • 25.
    RGui menu: Misctab DESCRIPTION OF R INTERFACE Misc tab: diverse operation s
  • 26.
    RGui menu: Packagestabs DESCRIPTION OF R INTERFACE Packages tab: adding functions to R foundation
  • 27.
    RGui menu: Windowstab DESCRIPTION OF R INTERFACE Windows tab: usual options to arrange the tiles
  • 28.
    RGui menu: Helptab DESCRIPTION OF R INTERFACE Help tab: very important links to help
  • 29.
    AGENDA • History andevolution of R • Principle and software paradigm • Description of R interface• Advantages of R • Drawbacks of R • So why using R? • References for learning R
  • 30.
     Open sourcecode  You can access the code of the software  In-depth understanding of what R does  Modify the code ADVANTAGES OF R Adress of the « mgcv » package Link with Package sources (.tar.gz file) Screenshot of the CRAN webpage of the « mgcv » package. Source: CRAN
  • 31.
    R access tosource code ADVANTAGES OF R Screenshot of unzipping the « mgcv » package and browsing through the package’s files. Unzipping mgcv_1.7- 13.tar.gzfile (with 7zip) List of directories in the « mgcv » package List of functions (i.e open code) in the « src » (i.e code sources) directory the « mgcv » package1 2 3
  • 32.
    R is free ADVANTAGESOF R Software Academics Demo Commercial (basic) Commercial (full) R Free Free Free Free SAS Free to $100s Not available $1 000s $10 000s Statistica $100s 30 days limit ~$1 000 $10 000 Excel (Microsoft) Free to $10s Limited ~$100 $100s SPSS (IBM) $100s 14 days limit ~$2 000 $1 000s
  • 33.
    Interface with otherlanguages and scripting capabilities ADVANTAGES OF R Interfaces with virtually any other programming language Fortran, C, C++, Python… Tailor or rewrite your old codes in R R as a scripting language R scripts can launch or be launched by other languages « mgcv.c » file in the « mgcv » package coded in typical C programming language Screenshot of the file « mgcv.c » of the « mgcv » package open in WordPad
  • 34.
  • 35.
  • 36.
  • 37.
     R ~tool used by the finest researchers  Top-notch analytics capabilities R role in academia ADVANTAGES OF R
  • 38.
    Free open sourcephilosophy To summarize ADVANTAGES OF R  R websites with many examples  Free books  Free online open courses  Twitter accounts Online help and discussion  Mailing-lists  Very active and diverse forums  Communities of developers and helpers
  • 39.
    AGENDA • History andevolution of R • Principle and software paradigm • Description of R interface • Advantages of R• Drawbacks of R • So why using R? • References for learning R
  • 40.
    Poor management oflarge datasets Avoid imbricated loops Prefer R advanced language for data structure Complicated structure of packages in R  Dozen of packages  To be loaded every time in memory R packages to better manage memory  Rhadoop (inspiration from Google)  Ff  bigmemory Average memory performance DRAWBACKS OF R
  • 41.
    No default parallelexecution R packages to use several cores Top skills needed for high performance computing A high-level programming language Abstract and modern (Python…Ϳ More productive coding But further from « machine language »… … meaning 100 times slower than C Average computing performance DRAWBACKS OF R
  • 42.
    Difficult to inspectdata sets Difficult data visualization and management DRAWBACKS OF R
  • 43.
    Problems for largeorganizations R made of several thousands independent packages No deployment plan for complex organizations No installation support Lack of code accountability Thousands of individual independent R developers Nobody responsible for the quality of the code Potentially high hidden costs with R  Total cost may favour commercial solutions for complex computations made in large corporations Difficult architecture management DRAWBACKS OF R
  • 44.
    Steep learning curve Rcode far from undergrad computer science courses Very complex data structures (useful if mastered) Is R’s syntax not logical? Relatively difficult to learn DRAWBACKS OF R Still, not more difficult to learn than SAS   Both SAS and R more abstract than basic programming languages (Fortran, C… Ϳ Difficult to learn = more rewarding professionally!!
  • 45.
    AGENDA • History andevolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R• So why use R? • References for learning R
  • 46.
    No language isperfect!! Contradictory objectives to meet Strengths and weaknesses of each language Effect of legacy and the culture of the organization Use existing solutions (system architecture, BA tools…Ϳ Habits in business analytics Different needs imply different tools Large corporations + defined procedures  SAS-like Less financial resources + quick proof of concept  R More positive than negative points SO WHY LEARN R?
  • 47.
    Very appealing solution SOWHY LEARN R? Popularity of business analytics software (green = very popular, red = unpopular). Over all Corpor ate Consulta nts Academ ics NGO/Go v'tR SAS IBM SPSS STATIST ICA Own code
  • 48.
    AGENDA • History andevolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R?• References for learning R
  • 49.
    Many books available:choose the one that fits you! Style, pedagogy, theory vs practice Browse several books at local library or store Springer’s UseR! Series (http://www.springer.com/series/6991) Recent, concise, good quality, affordable, diverse Pure rookies: « A beginners’ guide to R », « R by example » One step forward: « Business analytics for managers » Intensive Excel users: « R through Excel » O’Reilly R series ;for programmersͿ « R cookbook », « R in a nuttshell » Books REFERENCES FOR LEARNING R
  • 50.
    Websites REFERENCES FOR LEARNINGR R official websites The R project for statistical computing (www.r-project.org ) Mailing lists (« R-help », Special Interest Groups) and R journal Official (austere) manuals (« An introduction to R ») Other websites UCLA online R resources http://www.ats.ucla.edu/stat/r/) R blogs aggregator (www.r-bloggers.com) Social networks: LinkedIn groups (The R project for statistical computing), Twitter accounts (@RevolutionR, @inside_R), jobboards
  • 51.
    Growing number ofconferences about R Official International R UseR! conference Annual during a few days in new venue (Google it!) Lots of materials about many topics Other conferences or venues  Find (or even start!) a R user group close to your location (R Wiki geographical list, map ofgroups on « meetup.com »)  Events and news from R-bloggers blog Conferences REFERENCES FOR LEARNING R