Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Statistical Application Development with R and Python
  • Table Of Contents Toc
  • Feedback & Rating feedback
Statistical Application Development with R and Python

Statistical Application Development with R and Python

4.3 (4)
close
close
Statistical Application Development with R and Python

Statistical Application Development with R and Python

4.3 (4)

Overview of this book

Statistical Analysis involves collecting and examining data to describe the nature of data that needs to be analyzed. It helps you explore the relation of data and build models to make better decisions. This book explores statistical concepts along with R and Python, which are well integrated from the word go. Almost every concept has an R code going with it which exemplifies the strength of R and applications. The R code and programs have been further strengthened with equivalent Python programs. Thus, you will first understand the data characteristics, descriptive statistics and the exploratory attitude, which will give you firm footing of data analysis. Statistical inference will complete the technical footing of statistical methods. Regression, linear, logistic modeling, and CART, builds the essential toolkit. This will help you complete complex problems in the real world. You will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code, and further enhanced by Python. The data analysis journey begins with exploratory analysis, which is more than simple, descriptive, data summaries. You will then apply linear regression modeling, and end with logistic regression, CART, and spatial statistics. By the end of this book you will be able to apply your statistical learning in major domains at work or in your projects.
Table of Contents (12 chapters)
close
close
11
Index

Continuous distributions

The numeric variables in the survey, Age, Mileage, and Odometer, can take any values over a continuous interval and these are examples of continuous RVs. In the previous section, we dealt with RVs that had discrete output. In this section, we will deal with RVs that have continuous output. A distinction from the previous section needs to be pointed out explicitly.

In the case of a discrete RV, there is a positive number for the probability of an RV taking on a certain value that is determined by the pmf. In the continuous case, an RV necessarily assumes any specific value with zero probability. These technical issues cannot be discussed in this book. In the discrete case, the probabilities of certain values are specified by the pmf, and in the continuous case the probabilities, over intervals, are decided by the probability density function, abbreviated as pdf.

Suppose that we have a continuous RV X with the pdf f(x) defined over the possible x values; that is, we assume that the pdf f(x) is well defined over the range of the RV X, denoted by Continuous distributions. It is necessary that the integration of f(x) over the range Continuous distributions is necessarily 1; that is, Continuous distributions.The probability that the RV X takes a value in an interval [a, b] is defined by:

Continuous distributions

In general, we are interested in the cumulative probabilities of a continuous RV, which is the probability of the event P(X<x). In terms of the previous equations, this is obtained as:

Continuous distributions

A special name for this probability is the cumulative density function. The mean and variance of a continuous RV are then defined by:

Continuous distributions

As in the previous section, we will begin with the simpler RV in uniform distribution.

Uniform distribution

A RV is said to have uniform distribution over the interval Uniform distribution if its probability density function is given by:

Uniform distribution

In fact, it is not necessary to restrict our focus on the positive real line. For any two real numbers a and b, from the real line, with b > a, the uniform RV can be defined by:

Uniform distribution

The uniform distribution has a very important role to play in simulation, as will be seen in Chapter 6, Simulation. As with the discrete counterpart, in the continuous case any two intervals of the same length will have an equal probability occurring. The mean and variance of a uniform RV over the interval [a, b] are respectively given by:

Uniform distribution

Example 1.4.1. Horgan’s (2008), Example 15.3: The International Journal of Circuit Theory and Applications reported in 1990 that researchers at the University of California, Berkeley, had designed a switched capacitor circuit for generating random signals whose trajectory is uniformly distributed over the unit interval [0, 1]. Suppose that we are interested in calculating the probability that the trajectory falls in the interval [0.35, 0.58]. Though the answer is straightforward, we will obtain it using the punif function:

> punif(0.58)-punif(0.35)
[1] 0.23

Of course, we don’t need software for such simple integrals, nevertheless:

Uniform distribution

Exponential distribution

The exponential distribution is probably one of the most important probability distributions in statistics, and more so for computer scientists. The numbers of arrivals in a queuing system, the time between two incoming calls on a mobile, the lifetime of a laptop, and so on, are some of the important applications where this distribution has a lasting utility value. The pdf of an exponential RV is specified by:

Exponential distribution

The parameter Exponential distribution is sometimes referred to as the failure rate. The exponential RV enjoys a special property called the memory-less property, which conveys that: Exponential distribution

The mathematical statement translates into the property that if X is an exponential RV, then its failure in the future depends on the present, and the past (age) of the RV does not matter. In simple words, this means that the probability of failure is constant in time and does not depend on the age of the system. Let us obtain the plots of a few exponential distributions:

> par(mfrow=c(1,2))
> curve(dexp(x,1),0,10,ylab=”f(x)”,xlab=”x”,cex.axis=1.25)
> curve(dexp(x,0.2),add=TRUE,col=2)
> curve(dexp(x,0.5),add=TRUE,col=3)
> curve(dexp(x,0.7),add=TRUE,col=4)
> curve(dexp(x,0.85),add=TRUE,col=5)
> legend(6,1,paste("Rate = ",c(1,0.2,0.5,0.7,0.85)),col=1:5,pch= 
+ "___”)
> curve(dexp(x,50),0,0.5,ylab=”f(x)”,xlab=”x”)
> curve(dexp(x,10),add=TRUE,col=2)
> curve(dexp(x,20),add=TRUE,col=3)
> curve(dexp(x,30),add=TRUE,col=4)
> curve(dexp(x,40),add=TRUE,col=5)
> legend(0.3,50,paste("Rate = ",c(1,0.2,0.5,0.7,0.85)),col=1:5,pch= 
+ "___”)
Exponential distribution

The exponential densities

The mean and variance of this exponential distribution are listed as follows:

Exponential distribution

The complete Python code block is given next:

Exponential distribution

Normal distribution

The normal distribution is in some sense an all-pervasive distribution that arises sooner or later in almost any statistical discussion. In fact, it is very likely that the reader may already be familiar with certain aspects of the normal distribution; for example, the shape of a normal distribution curve is bell-shaped. The mathematical appropriateness is probably reflected through the reason that though it has a simpler expression, its density function includes the three most famous irrational numbers Normal distribution

Suppose that X is normally distributed with the mean Normal distribution and the variance Normal distribution. Then, the probability density function of the normal RV is given by:

Normal distribution

If the mean is zero and the variance is 1, the normal RV is referred to as the standard normal RV, and the standard is to denote it by Z.

Example 1.4.2. Shady Normal Curves: We will again consider a standard normal random variable, which is more popularly denoted in Statistics by Z. Some of the most needed probabilities are P(Z > 0) and P(-1.96 < Z < 1.96). These probabilities are now shaded:

> par(mfrow=c(3,1))
> # Probability Z Greater than 0
> curve(dnorm(x,0,1),-4,4,xlab=”z”,ylab=”f(z)”)
> z=seq(0,4,0.02)
> lines(z,dnorm(z),type=”h”,col=”grey”)
> # 95% Coverage
> curve(dnorm(x,0,1),-4,4,xlab=”z”,ylab=”f(z)”)
> z=seq(-1.96,1.96,0.001)
> lines(z,dnorm(z),type=”h”,col=”grey”)
> # 95% Coverage
> curve(dnorm(x,0,1),-4,4,xlab=”z”,ylab=”f(z)”)
> z=seq(-2.58,2.58,0.001)
> lines(z,dnorm(z),type=”h”,col=”grey”)
Normal distribution

Shady normal curves

The Python program for the shady normal probabilities is given next:

Normal distribution

Create a Note

Modal Close icon
You need to login to use this feature.
notes
bookmark search playlist font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Edit Note

Modal Close icon
Write a note (max 255 characters)
Cancel
Update Note

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY