10. R and Big Data | R High Performance Programming

Book Overview & Buying
Table Of Contents
Feedback & Rating

R High Performance Programming

By : Aloysius Shao Qin Lim, Tjhi W Chandra

4.4 (14)

Buy this Book

R High Performance Programming

4.4 (14)

By: Aloysius Shao Qin Lim, Tjhi W Chandra

Buy this Book

Overview of this book

This book is for programmers and developers who want to improve the performance of their R programs by making them run faster with large data sets or who are trying to solve a pesky performance problem.

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Free Chapter

1. Understanding R's Performance – Why Are R Programs Sometimes Slow?

Three constraints on computing performance – CPU, RAM, and disk I/O

R is interpreted on the fly

R is single-threaded

R requires all data to be loaded into memory

Algorithm design affects time and space complexity

Summary

2. Profiling – Measuring Code's Performance

Measuring total execution time

Profiling the execution time

Profiling memory utilization

Monitoring memory utilization, CPU utilization, and disk I/O using OS tools

Identifying and resolving bottlenecks

Summary

3. Simple Tweaks to Make R Run Faster

Vectorization

Use of built-in functions

Preallocating memory

Use of simpler data structures

Use of hash tables for frequent lookups on large data

Seeking fast alternative packages in CRAN

Summary

4. Using Compiled Code for Greater Speed

Compiling R code before execution

Using compiled languages in R

Summary

5. Using GPUs to Run R Even Faster

General purpose computing on GPUs

R and GPUs

Fast statistical modeling in R with gputools

Summary

6. Simple Tweaks to Use Less RAM

Reusing objects without taking up more memory

Removing intermediate data when it is no longer needed

Calculating values on the fly instead of storing them persistently

Swapping active and nonactive data

Summary

7. Processing Large Datasets with Limited RAM

Using memory-efficient data structures

Using memory-mapped files and processing data in chunks

Summary

8. Multiplying Performance with Parallel Computing

Data parallelism versus task parallelism

Implementing data parallel algorithms

Implementing task parallel algorithms

Executing tasks in parallel on a cluster of computers

Shared memory versus distributed memory parallelism

Optimizing parallel performance

Summary

9. Offloading Data Processing to Database Systems

Extracting data into R versus processing data in a database

Preprocessing data in a relational database using SQL

Converting R expressions to SQL

Running statistical and machine learning algorithms in a database

Using columnar databases for improved performance

Using array databases for maximum scientific-computing performance

Summary

10. R and Big Data

Understanding Hadoop

Setting up Hadoop on Amazon Web Services

Processing large datasets in batches using Hadoop

Summary

Index

Customer Reviews

4.4 (14)

5 star

57.1%

4 star

28.6%

3 star

7.1%

2 star

7.1%

1 star

R High Performance Programming

By : Aloysius Shao Qin Lim, Tjhi W Chandra

R High Performance Programming

By: Aloysius Shao Qin Lim, Tjhi W Chandra

Overview of this book

Summary

Unlock full access

Continue reading for free

R High Performance Programming

By : Aloysius Shao Qin Lim, Tjhi W Chandra

R High Performance Programming

By: Aloysius Shao Qin Lim, Tjhi W Chandra

Overview of this book

Summary

Unlock full access

Continue reading for free

Create a Note

Delete Bookmark

Delete Note

Edit Note

Confirmation

Buy this book with your credits?