Chapter 1. Understanding R's Performance – Why Are R Programs Sometimes Slow?
R is a great tool used for statistical analysis and data processing. When it was first developed in 1993, it was designed as a tool that would teach data analysis courses. Because it is so easy to use, it became more and more popular over the next 20 years, not only in academia, but also in government and industry. R is also an open source tool, so its users can use it for free and contribute new statistical packages to the R public repository called the Comprehensive R Archive Network (CRAN). As the CRAN library became richer with more than 6,000 well-documented and ready-to-use packages at the time of writing this book, the attractiveness of R increased even further. In these 20 years, the volume of data being created, transmitted, stored, and analyzed, by organizations and individuals alike, has also grown exponentially. R programmers who need to process and analyze the ever growing volume of data sometimes find that R's performance suffers under such heavy loads. Why does R sometimes not perform well, and how can we overcome its performance limitations? This book examines the factors behind R's performance and offers a variety of techniques to improve the performance of R programs, for example, optimizing memory usage, performing computations in parallel, or even tapping the computing power of external data processing systems.
Before we can find the solutions to R's performance problems, we need to understand what makes R perform poorly in certain situations. This chapter kicks off our exploration of the high-performance R programming by taking a peek under the hood to understand how R is designed, and how its design can limit the performance of R programs.
We will examine three main constraints faced by any computational task—CPU, RAM, and disk input/output (I/O)—and then look at how these play out specifically in R programs. By the end of this chapter, you will have some insights into the bottlenecks that your R programs could run into.
This chapter covers the following topics:
- Three constraints on computing performance—CPU, RAM, and disk I/O
- R is interpreted on the fly
- R is single-threaded
- R requires all data to be loaded into memory
- Algorithm design affects time and space complexity