Book Image

R High Performance Programming

By : Aloysius Shao Qin Lim
Book Image

R High Performance Programming

By: Aloysius Shao Qin Lim

Overview of this book

This book is for programmers and developers who want to improve the performance of their R programs by making them run faster with large data sets or who are trying to solve a pesky performance problem.
Table of Contents (12 chapters)
11
Index

Using memory-mapped files and processing data in chunks


Some datasets are so large that even after applying all memory optimization techniques and using the most efficient data types possible, they are still too large to fit in or be processed in the memory. Short of getting additional RAM, one way to work with such large data is to store them on a disk in the form of memory-mapped files and load the data into the memory for processing one small chunk at a time.

For example, say we have a dataset that would require 100 GB of RAM if it is fully loaded into the memory and another 100 GB of free memory for the computations that need to be performed on the data. If the computer on which the data is to be processed only has 64 GB of RAM, we might divide the data into four chunks of 25 GB each. The R program will then load the data into the memory one chunk at a time and perform the necessary computations on each chunk. After all the chunks have been processed, the results from each chunk-wise...