Quantile() function in R - A brief guide | DigitalOcean (2024)

You can generate the sample quantiles using the quantile() function in R.

Hello people, today we will be looking at how to find the quantiles of the values using the quantile() function.

Quantile: In laymen terms, a quantile is nothing but a sample that is divided into equal groups or sizes. Due to this nature, the quantiles are also called as Fractiles. In the quantiles, the 25th percentile is called as lower quartile, 50th percentile is called as Median and the 75th Percentile is called as the upper quartile.

In the below sections, let’s see how this quantile() function works in R.

Quantile() function syntax

The syntax of the Quantile() function in R is,

quantile(x, probs = , na.rm = FALSE)

Where,

  • X = the input vector or the values
  • Probs = probabilities of values between 0 and 1.
  • na.rm = removes the NA values.

A Simple Implementation of quantile() function in R

Well, hope you are good with the definition and explanations about quantile function. Now, let’s see how quantile function works in R with the help of a simple example which returns the quantiles for the input data.

#creates a vector having some values and the quantile function will return the percentiles for the data.df<-c(12,3,4,56,78,18,46,78,100)quantile(df)

Output:

0% 25% 50% 75% 100%3 12 46 78 100

In the above sample, you can observe that the quantile function first arranges the input values in the ascending order and then returns the required percentiles of the values.

Note: The quantile function divides the data into equal halves, in which the median acts as middle and over that the remaining lower part is lower quartile and upper part is upper quartile.

Handle the missing values - ‘NaN’

NaN’s are everywhere. In this data-driven digital world, you may encounter these NaN’s more frequently, which are often called as the missing values. If your data by any means has these missing values, you can end up with getting the NaN’s in the output or the errors in the output.

So, in order to handle these missing values, we are going to use na.rm function. This function will remove the NA values from our data and returns the true values.

Let’s see how this works.

#creates a vector having values along with NaN'sdf<-c(12,3,4,56,78,18,NA,46,78,100,NA)quantile(df)

Output:

Error in quantile.default(df) :missing values and NaN's not allowed if 'na.rm' is FALSE

Oh, we got an error. If your guess is regarding the NA values, you are absolutely smart. If NA values are present in our data, the majority of the functions will end up in returning the NA values itself or the error message as mentioned above.

Well, let’s remove these missing values using the na.rm function.

#creates a vector having values along with NaN'sdf<-c(12,3,4,56,78,18,NA,46,78,100,NA)#removes the NA values and returns the percentilesquantile(df,na.rm = TRUE)

Output:

0% 25% 50% 75% 100%3 12 46 78 100

In the above sample, you can see the na.rm function and its impact on the output. The function will remove the NA’s to avoid the false output.

The ‘Probs’ argument in the quantile

As you can see the probs argument in the syntax, which is showcased in the very first section of the article, you may wonder what does it mean and how it works?. Well, the probs argument is passed to the quantile function to get the specific or the custom percentiles.

Seems to be complicated? Dont worry, I will break it down to simple terms.

Well, whenever you use the function quantile, it returns the standard percentiles like 25,50 and 75 percentiles. But what if you want 47th percentile or maybe 88th percentile?

There comes the argument ‘probs’, in which you can specify the required percentiles to get those.

Before going to the example, you should know few things about the probs argment.

Probs: The probs or the probabilities argument should lie between 0 and 1.

Here is a sample which illustrates the above statement.

#creates the vector of valuesdf<-c(12,3,4,56,78,18,NA,46,78,100,NA)#returns the quantile of 22 and 77 th percentiles. quantile(df,na.rm = T,probs = c(22,77))

Output:

Error in quantile.default(df, na.rm = T, probs = c(22, 77)) : 'probs' outside [0,1]

Oh, it’s an error!

Did you get the idea, what happened?

Well, here comes the Probs statement. Even though we mentioned the right values in the probs argument, it violates the 0-1 condition. The probs argument should include the values which should lie in between 0 and 1.

So, we have to convert the probs 22 and 77 to 0.22 and 0.77. Now the input values is in between 0 and 1 right? I hope this makes sense.

#creates a vector of valuesdf<-c(12,3,4,56,78,18,NA,46,78,100,NA)#returns the 22 and 77th percentiles of the input valuesquantile(df,na.rm = T,probs = c(0.22,0.77))

Output:

 22% 77% 10.08 78.00 

The ‘Unname’ function and its use

Suppose you want your code to only return the percentiles and avoid the cut points. In these situations, you can make use of the ‘unname’ function.

The ‘unname’ function will remove the headings or the cut points ( 0%,25% , 50%, 75% , 100 %) and returns only the percentiles.

Let’s see how it works!

#creates a vector of valuesdf<-c(12,3,4,56,78,18,NA,46,78,100,NA)quantile(df,na.rm = T,probs = c(0.22,0.77))#avoids the cut-points and returns only the percentiles.unname(quantile(df,na.rm = T,probs = c(0.22,0.77)))

Output:

10.08 78.00

Now, you can observe that the cut-points are disabled or removed by the unname function and returns only the percentiles.

The ‘round’ function and its use

We have discussed the round function in R in detail in the past article. Now, we are going to use the round function to round off the values.

Let’s see how it works!

#creates a vector of valuesdf<-c(12,3,4,56,78,18,NA,46,78,100,NA)quantile(df,na.rm = T,probs = c(0.22,0.77))#returns the round off valuesunname(round(quantile(df,na.rm = T,probs = c(0.22,0.77))))

Output:

10 78

As you can see that our output values are rounded off to zero decimal points.

Get the quantiles for the multiple groups/columns in a data set

Till now, we have discussed the quantile function, its uses and applications as well as its arguments and how to use them properly.

In this section, we are going to get the quantiles for the multiple columns in a data set. Sounds interesting? follow me!

I am going to use the ‘mtcars’ data set for this purpose and also using the ‘dplyr’ library for this.

#reads the datadata("mtcars")#returns the top few rows of the datahead(mtcars)#install required paclagesinstall.packages('dplyr')library(dplyr)#using tapply, we can apply the function to multiple groupsdo.call("rbind",tapply(mtcars$mpg, mtcars$gear, quantile))

Output:

 0% 25% 50% 75% 100%3 10.4 14.5 15.5 18.400 21.54 17.8 21.0 22.8 28.075 33.95 15.0 15.8 19.7 26.000 30.4

In the above process, we have to install the 'dplyr’ package, and then we will make use of tapply and rbind functions to get the multiple columns of the mtcars datasets.

In the above section, we took multiple columns such as ‘mpg’ and the ‘gear’ columns in mtcars data set. Like this, we can compute the quantiles for multiple groups in a data set.

Can we visualise the percentiles?

My answer is a big YES!. The best plot for this will be a box plot. Let me take the iris dataset and will try to visualize the box plot which will showcase the percentiles as well.

Let’s roll!

data(iris)head(iris)

Quantile() function in R - A brief guide | DigitalOcean (1)

This is the iris data set with top 6 values.

Let’s explore the data with the function named - ‘Summary’.

summary(iris)

Quantile() function in R - A brief guide | DigitalOcean (2)

In the above image, you can see the mean, median, 25th percentile(1 st quartile), 75 th percentile(3rd percentile) and min and max values as well. Let’s plot this information through a box plot.

Let’s do it!

#plots a boxplot with labelsboxplot(iris$Sepal.Length,main='The boxplot showing the percentiles',col='Orange',ylab='Values',xlab='Sepal Length',border = 'brown',horizontal = T) 

Quantile() function in R - A brief guide | DigitalOcean (3)

A box plot can show many aspects of the data. In the below figure I have mentioned the particular values represented by the box plots. This will save some time for you and facilitates your understanding in the best way possible.

Quantile() function in R - A brief guide | DigitalOcean (4)

Quantile() function in R - Wrapping up

Well, it’s a longer article I reckon. And I tried my best to explain and explore the quantile() function in R in multiple dimensions through various examples and illustrations as well. The quantile function is the most useful function in data analysis as it efficiently reveals more information about the given data.

I hope you got a good understanding of the buzz around the quantile() function in R. That’s all for now. We will be back with more and more beautiful functions and topics in R programming. Till then take care and happy data analyzing!!!

More study: R documentation.

Quantile() function in R - A brief guide | DigitalOcean (2024)

FAQs

What does quantile() do in R? ›

Note: The quantile function divides the data into equal halves, in which the median acts as middle and over that the remaining lower part is lower quartile and upper part is upper quartile.

What does quantile () do? ›

In probability and statistics, the quantile function outputs the value of a random variable such that its probability is less than or equal to an input probability value.

How do you find the quantile function? ›

The quantile function is defined on the unit interval (0, 1). For F continuous and strictly increasing at t, then Q(u)=t iff F(t)=u. Thus, if u is a probability value, t=Q(u) is the value of t for which P(X≤t)=u.

What is a quantile for dummies? ›

Definition and Use

Quantiles are commonly assumed to divide sets of ordered numbers into equal-sized groups. Quartiles are expected to divide them into 4 equal groups. Deciles are supposed to divide them into 10 equal groups. Percentiles should divide them into 100 equal-sized groups.

Why do we use quantile? ›

Summary. Sample quantiles give a robust description of the distribution of a variable. The sample median can be used to measure the centre of a distribution. The quartiles and interquartile range give a measure of the spread of a distribution.

When should I use quantile regression? ›

The main advantage of quantile regression methodology is that the method allows for understanding relationships between variables outside of the mean of the data,making it useful in understanding outcomes that are non-normally distributed and that have nonlinear relationships with predictor variables.

What do quantiles tell us? ›

Quantiles are values that split sorted data or a probability distribution into equal parts. In general terms, a q-quantile divides sorted data into q parts. The most commonly used quantiles have special names: Quartiles (4-quantiles): Three quartiles split the data into four parts.

How to use quantile regression in R? ›

Often it is useful to compute quantile regressions on a discrete set of τ's; this can be accomplished by specifying tau as a vector in rq: > xx <- income - mean(income) > fit1 <- summary(rq(foodexp~xx,tau=2:98/100)) > fit2 <- summary(rq(foodexp~xx,tau=c(. 05, . 25, .

How does R calculate quartiles? ›

To calculate quartiles in R, use the QUANTILE() function. Suppose you have your data stored in a variable called x. You can find the three quartiles by typing: QUANTILE(x, probs = (0.25, 0.5, 0.75)).

What is the difference between quartile and quantile in R? ›

Quantile and Quartile gives the measure of variabilty in the data. Quantiles provides a way to divide the numbers of a given distribution in equal subgroups after sorting the data. Quartiles are the three points in the dataset which divides the number of observations into four equal subgroups.

How to find quantiles of normal distribution in R? ›

If our variable is normally distributed, in R we can use the function qnorm() to do so. We can specify the probability as the first parameter, then specify the mean and then specify the standard deviation, for example, qnorm(0.2, mean = 25, sd = 5) .

What is the normal quantile function? ›

The quantile function of a normal distribution is equal to the inverse of the distribution function since the latter is continuous and strictly increasing. However, as we explained in the lecture on normal distribution values, the distribution function of a normal variable has no simple analytical expression.

What is the formula for quantiles in statistics? ›

Quantiles of a population. Pr[X ≤ x] ≥ k/q. For a finite population of N equally probable values indexed 1, …, N from lowest to highest, the k-th q-quantile of this population can equivalently be computed via the value of Ip = N k/q.

What is the method of quantile? ›

Quantiles are points in a distribution that relate to the rank order of values in that distribution. For a sample, you can find any quantile by sorting the sample. The middle value of the sorted sample (middle quantile, 50th percentile) is known as the median. The limits are the minimum and maximum values.

What is an example of a quantile measure? ›

The Quantile® measure is shown as a number with a "Q" after it: 750Q is 750 Quantile®. Quantile® measures range from EM400Q–1600Q and span the skills and concepts taught in kindergarten through high school. Scores below 0Q are prefixed with "EM" for Emerging Mathematician.

What does a quantile plot do? ›

The QQ plot, or quantile-quantile plot, is a graphical tool to help us assess if a set of data plausibly came from some theoretical distribution such as a normal or exponential.

Is quantile the same as percentile in R? ›

The 0.95 quantile point is exactly the same as the 95th percentile point. R does not work with percentiles, rather R works with quantiles. The R command for this is quantile() where we need to give that function the variable holding the data we are using and we need to give the function one or more decimal values.

What does a quantile-quantile graph do for you in a regression analysis? ›

Q-Q plots are also known as Quantile-Quantile plots. As the name suggests, they plot the quantiles of a sample distribution against quantiles of a theoretical distribution. Doing this helps us determine if a dataset follows any particular type of probability distribution like normal, uniform, exponential.

References

Top Articles
Latest Posts
Article information

Author: Roderick King

Last Updated:

Views: 5781

Rating: 4 / 5 (71 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Roderick King

Birthday: 1997-10-09

Address: 3782 Madge Knoll, East Dudley, MA 63913

Phone: +2521695290067

Job: Customer Sales Coordinator

Hobby: Gunsmithing, Embroidery, Parkour, Kitesurfing, Rock climbing, Sand art, Beekeeping

Introduction: My name is Roderick King, I am a cute, splendid, excited, perfect, gentle, funny, vivacious person who loves writing and wants to share my knowledge and understanding with you.