Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Data Wrangling with R
  • Table Of Contents Toc
  • Feedback & Rating feedback
Data Wrangling with R

Data Wrangling with R

By : Gustavo R Santos, Gustavo Santos
4.9 (7)
close
close
Data Wrangling with R

Data Wrangling with R

4.9 (7)
By: Gustavo R Santos, Gustavo Santos

Overview of this book

In this information era, where large volumes of data are being generated every day, companies want to get a better grip on it to perform more efficiently than before. This is where skillful data analysts and data scientists come into play, wrangling and exploring data to generate valuable business insights. In order to do that, you’ll need plenty of tools that enable you to extract the most useful knowledge from data. Data Wrangling with R will help you to gain a deep understanding of ways to wrangle and prepare datasets for exploration, analysis, and modeling. This data book enables you to get your data ready for more optimized analyses, develop your first data model, and perform effective data visualization. The book begins by teaching you how to load and explore datasets. Then, you’ll get to grips with the modern concepts and tools of data wrangling. As data wrangling and visualization are intrinsically connected, you’ll go over best practices to plot data and extract insights from it. The chapters are designed in a way to help you learn all about modeling, as you will go through the construction of a data science project from end to end, and become familiar with the built-in RStudio, including an application built with Shiny dashboards. By the end of this book, you’ll have learned how to create your first data model and build an application with Shiny in R.
Table of Contents (21 chapters)
close
close
1
Part 1: Load and Explore Data
5
Part 2: Data Wrangling
12
Part 3: Data Visualization
16
Part 4: Modeling

To get the most out of this book

The get the most out of the content presented in this book, it is expected that you have a minimum knowledge of object-oriented programming (creating variables, loops, and functions) and have already worked with R. A basic knowledge of data science concepts is also welcome and can help you understand the tutorials and projects.

All the software and code are created using RStudio for Windows 10, and if you want to code along with the examples, you will need to install R and RStudio on your local machine. To do that, you should go to https://cran.r-project.org/, click on Download R for Windows (or for your operating system), then click on base, and finally, click on Download R-X.X.X for Windows. This will download the R language executable file to your machine. Then, you can double-click on the file to install, accepting the default selections.

Next, you need to install RStudio, renamed to Posit in 2022. The URL to download the software is found here: https://posit.co/download/rstudio-desktop/. Click on Download and look for the version of your operating system. The software has a free of charge version and you can install it, accepting the default options once again.

The main libraries used in the tutorials from this book are indicated as follows:

Software/Library

Version

R

4.1.0

RStudio

2022.02.3+492 for Windows

Tidyverse

1.3.1

Tidytext

0.3.2

Gutenbergr

0.2.1

Patchwork

1.1.1

wordcloud2

0.2.1

ROCR

1.0-11

Shinythemes

1.2.0

Plotly

4.10.0

Caret

6.0-90

Shiny

1.7.1

Skimr

2.1.4

Lubridate

1.8.0

randomForest

4.7-1

data.table

1.14.2

To install any library in RStudio, just use the following code snippet:

# Installing libraries to RStudio
install.packages(“package_name”)
# Loading a library to a session
library(package_name)

In R, it can be useful to remind yourself of, or have in mind, these two code snippets. The first one is how to write for loops. We can write it as, for a given condition, execute a piece of code until the condition is not met anymore:

for (num in 1:5) {
    print(num)
}

The other one is the skeleton of a function written in R language, where we provide variables and the code of what should be done with those variables, returning the resulting calculation:

custom_sum_function <- function(var1, var2) {
    # Function code
    my_sum = sum(var1 + var2)
    return(my_sum)
}

If you are using a digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository, preventing any potential errors with code broken due to copy and paste.

Unlock full access

Continue reading for free

A Packt free trial gives you instant online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech
bookmark search playlist font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY