What is the R.

R is a programming language and environment for statistical calculations and graphics. It is a GNU project similar to the S language and environment developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues.

R can be thought of as another implementation of S. There are some key differences, but much of the code written for S runs unchanged under R.

R offers a wide variety of statistical (linear and non-linear modeling, classic statistical tests, time series analysis, classification, clustering, ...) and graphical techniques and is highly expandable. The S language is often the means of choice for research in statistical methodology, and R provides an open source way to participate in this activity.

One of the strengths of R is the ease with which it can create well-designed, publication-quality plots, including math symbols and formulas if needed. Great care has been taken in the presets for the smaller design selections in the graphics, but the user remains in full control.

What do I need it for?

R is an integrated software package for data manipulation, calculation and graphic display. it includes

  • effective data processing and storage
  • a number of operators for calculations on arrays, especially matrices
  • a large, coherent, integrated set of tools for data analysis
  • graphical possibilities for data analysis and display either on screen or on paper
  • a well-developed, simple and effective programming language that contains conditionalizations, loops, user-defined recursive functions, and input and output facilities

Area R

Many users consider R to be a statistical system. We prefer to think of it as an environment in which statistical techniques are implemented.

The term “environment” is intended to characterize it as a fully planned and coherent system, and not as an incremental cluster of very specific and inflexible tools, as is often the case with other data analysis software.

R, like S, is designed around a real computer language and allows the user to add additional functionality by defining new functions. Much of the system itself is written in the R dialect of S, which makes it easy for users to follow the algorithmic decisions made. For computationally intensive tasks, C, C ++ and Fortran code can be linked and called. Advanced users can write C code to manipulate R objects directly.

R has its own LaTeX-like documentation format that is used to provide comprehensive documentation both online in various formats and on paper.

SAP Analytics Tools: On Premise vs. Cloud [E-Book]

Here you will find the information you need and some key questions to be able to check whether you want to run your analytics applications on-premise or in the cloud.

Communicate with R

R has several ways to present and share the work, either through a Markdown document or an app. Everything can be hosted on Rpub, GitHub, or on the company's website.

Rstudio accepts Markdown to write a document. You can export the documents in different formats:

Document:

  • HTML
  • PDF / latex
  • word
  • presentation
  • HTML
  • PDF projector

Who is using it?

When we examine the industry's use of R, we see academics come first. R is a language for doing statistics. R is the number one choice in the healthcare industry, followed by government and advice.

Why use R?

Data science is shaping the way companies do business. Without a doubt, moving away from artificial intelligence and machines will cause the company to fail. The big question is which tool / language should you be using.

There are a variety of tools on the market for performing data analysis. Learning a new language takes a certain amount of time, but if you want the best insight into the data, you will need to spend some time learning the appropriate tool, R.

What do I need for that

R is available as source code as free software under the terms of the Free Software Foundation's GNU General Public License. It compiles and runs on a variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows, and macOS.

The minimum requirements

The RAM should be at least 8 GB as most of the data is stored in RAM and the data sets easily reach over 2 GB (small data sets).

Since this is single threading, you should have a processor that has good processing power. Note that in R you can run parallel processes with Snow, Parallel, and other packages.

GPU's are not needed, but for data science you need them when you edit very large data sets.

Therefore the minimum requirements are:

  • 8 GB RAM
  • 128GB SSD
  • i5 processor

Important packages and libraries

A core set of packages is included with the installation of R, with more than 15,000 additional packages as of September 2018 available on the Comprehensive R Archive Network (CRAN), Bioconductor, Omegahat, GitHub, and other repositories.

R can (easily) be extended via packages. There are about eight packages that come with the R distribution and many more are available through the CRAN family of websites that cover a very wide range of modern statistics.

Extension packages from R

R's capabilities are enhanced by user-created packages that enable special statistical techniques, graphical devices, import / export capabilities, reporting tools (Rmarkdown, Knitr, Sweave), etc. These packages are mainly developed in R, sometimes Java, C, C ++, and Fortran as well. The packaging system is also used by researchers to create compendia to systematically organize research data, code, and report files for sharing and public archiving.

Task Views

The Task Views page on the CRAN website lists a wide variety of tasks (in areas such as finance, genetics, high performance computing, machine learning, medical imaging, social sciences, and spatial statistics) for which R has been requested and for the packages Are available. It has also been identified by the FDA as being useful for interpreting clinical research data.

Crantastic

Other R-package resources include Crantastic, a community site for evaluating and reviewing all CRAN packages, and R-Forge, a central platform for the joint development of R-packages, R-related software and projects. R-Forge also hosts many unreleased beta packages and development versions of CRAN packages. Microsoft maintains a daily snapshot of CRAN that dates back to September 17, 2014.

Bioconductor

The Bioconductor project provides R-packages for the analysis of genomic data. This includes object-oriented data handling and analysis tools for data from Affymetrix, cDNA microarrays and next-generation high-throughput sequencing methods.

Disadvantages compared to Python

Python makes replicability and accessibility easier. In fact, if you need to use the results of your analysis in an application or website, Python is your best bet.

And so the reality is that both languages ​​are valuable and both are here to stay. Our experience confirms this. Many data science teams are bilingual today and use both R and Python in their work.

Now you have got an overview of R and his abilities. Do you have any further questions or do you need help? Then book your R advisor now.

Swen Deobald

My name is Swen Deobald and I am an enthusiastic SAP Analytics consultant. As Compamind's head of department, my team and I support you in all matters relating to SAP Analytics, Business Warehouse, BusinessObjects and the SAP Analytics Cloud.

Do you like this article? Subscribe to our news and blog posts!

Enter your e-mail and you will regularly receive a compilation of our current blog posts.


You might also be interested in:


You might also be interested in

With the Analytics Hub from SAP, company users can access the analytics solutions in their SAP cloud even more easily. So SAP Cloud and Hub work hand in hand. This should make it even easier to use the saved content [...]

The question is often asked what actually happens if the cloud fails? Many customers share these concerns before introducing Power BI. That's why we want to shed some light on what actually happens when Microsoft turns the lights on [...]