About the independent samples t-test

When would I use an independent samples t-test?

When is it useful?

An independent samples t-test is the most useful when you are comparing two separate groups (based on something you've measured about them). Here are some examples of research designs for which an independent samples t-test might be appropriate:

  • You want to know which of two potting soils results in the tallest tomato plants. You do an experiment, planting tomatoe seeds in each kind of soil and measuring the heights of the resulting plants.
  • You want to compare the average family size of U.S. Catholics and U.S. Mormons. You survey a random sample of Catholics and Mormons, and ask them the size of their families.
  • You want to compare the average income of men and women employed by small businesses in the intermountain west. You survey a random sample of men and women in your target population and ask their income.

The design of an independent samples t-test includes one categorical independent variable with two levels (that is, two samples distinguished by something like gender, treatment group, etc.), and one continuous dependent variable (that is, something measured about each sample, such as test scores, number of sick days, plant height, etc.).

Not quite...
Does this sound like what you need?
Sound right?

Perhaps a different statistical test will suit your needs better. Some options might include: 

  • Paired samples t-test (often used for pre-post comparisons)
  • One-way ANOVA (for when you have more than two groups)

Return to the home page for more options.

How does an independent samples t-test work?

The independent samples t-test is a hypothesis test. The null hypothesis is usually that the samples means are the same — that is, the true difference between means is zero. The independent samples t-test gives you the likelihood of getting your sample means if this hypothesis is true.

In other words, when we perform an independent samples t-test, we are asking: "How likely are we to observe the difference we see between Sample A and Sample B, if the true means of Population A and Population B are really the same?" Another way of putting it is, "What is the likelihood that the differences between my sample means are due purely to random sampling variation, rather than due to differences in the underlying population they represent?"

If the likelihood is very small, you can make an argument that the means of the two populations are probably different (this is what it means to "reject the null hypothesis"). This does not mean that we know for certain that the means of the two underlying populations are different — it just means that if they were the same, our data would be very rare or anamolous.

Here are some resources that might help:

  • Resource #1
  • Resource #2

Setting up your R-script and data

Setting up your R script file

In RStudio, create a new R Script file (“File” -> “New File” -> “R Script”). First, set up your working directory. The working directory simply tells R where it should look for and save files. On your computer, ensure that your data file (which should be saved as a .csv) is in the folder that you want to use as your working directory. Then use the code below to set your working directory:

setwd("~/Dropbox/Research") Hover over the code for help.

If you aren't sure how to identify your working directory, you can go to “Session” -> “Set Directory” -> “Choose Directory”, and then navigate to your folder. In your “Console” window, a line of code just ran that includes the directory path to your folder. You can copy and paste this line of code into your R Script, so that if you need to run this analysis again, you have your code already.

Not yet...
Working directory set?
Did it work?

Here are some resources that might help:

  • Resource #1
  • Resource #2

Importing your data into R

Next, you will need to import your data into R. We are assuming that your data is in a .csv file. If not, you can open it in Excel and “save as” a .csv file. In the example we are using, our data is stored in a .csv file called “plant_data.csv”. The following line of code shows how we would import “plant_data.csv” into R.

my_data <- read.csv("plant_data.csv")

This line of code creates a new data frame called “my_data”, then looks for the .csv file named “plant_data.csv” in the working directory, and assigns the contents to the new data frame. In RStudio, this data frame can be seen in the “Environment” tab of your workspace. To view your data, click on “my_data” from the “Environment” tab, or simply type "my_data" in the console and press enter.

Not yet...
Data imported?
Did it work?

If your data is not in a .csv format, there are other ways to import the data that we do not cover here. Check out some of these tutorials for help importing other formats:

  • Resource #1
  • Resource #2

Otherwise, we recommend that you re-save your data into a .csv format. You can usually do this by opening the file in a program like Microsoft Excel, and then doing a "save as", and specifically selecting "comma-separated values" or ".csv" as the file format.

Tidying the Data

It is important that your data is tidy (a term with a specific meaning in data analysis). This means that each column represents a single variable. For example, look at the data set on the left, which lists the height in inches of 5 tomato plants from each of our two samples. Sample A was grown in one kind of soil (“Soil A”), while Sample B was grown in a different kind of soil (“Soil B”). Notice that each sample has its own column. This is non-tidy data.

Untidy Data
Sample A Sample B
45.72 21.53
38.94 19.84
43.87 21.42
54.62 23.31
41.24 24.78
Tidy Data
Observation Soil Type Plant Height
1 Soil A 45.72
2 Soil A 38.94
3 Soil A 43.87
4 Soil A 54.62
5 Soil A 41.24
6 Soil B 21.53
7 Soil B 19.84
8 Soil B 21.42
9 Soil B 23.31
10 Soil B 24.78

In contrast, take a look at the "tidy" data on the right. Notice that each column represents a distinct variable (instead of a sample): the first column represents observations (that is, each time a plant's height was measured), the second column represents soil type, and the third column represents plant height. If your data is already “tidy," feel free to move on to the next section.

Tidying your data in R

If your data looks like the first dataset above (where each sample has its own column), here’s some code that will help. There is a function called melt() that will transform your two sample columns into two variable columns. The function is in a package called reshape, so if you have not ever installed that library on your computer, you’ll need to install it first, using this line of code:


The first line of code needs to run only once on any computer on which you use R — once installed, the "reshape" package can be retrieved at any point. For this reason, the first line of code does not need to be saved in your R-script, unless you plan to run the code on multiple computers and do not know if the package will have already been installed. Once the package is installed, you can load the package using the second line of code. This line of code should be in your R-script file, because you will need to run it every time you wish to use R for this analysis (if you need to reshape your data). Once the "reshape" package has been installed and loaded, the following lines of code can help transform a table from the first table above (untidy data) into something that looks more like the second table above (tidy data).

my_data_tidy <- melt(my_data[,c("Soil_A", "Soil_B")])

This will result in a dataset that has only two columns: one labled "variable" and another labeled "value". We can give these columns more descriptive identifiers using this line of code below:

names(my_data) <- c("Soil_Type", "Plant_Height")

So far, we have taken the two sample columns in “my_data” and “melted” them into two variable columns, assigned the results to “my_data”, and given the new dataframe more sensible column names. Note, the above lines of code works even if you have other columns in your data besides the two depicted in the dataset above. Those columns are simply not included in the new dataframe. There are ways to melt data andpreserve other columns in your dataset, but none of those other columns are essential to peforming an independent samples t-test.

Not yet...
Is your data "tidy?"

There are ways to do data analysis with untidy data, but it takes a little bit more experience with R programming than we include in this tutorial. If your data is arranged anything like the "untidy" data above, we can walk you through how to do this in R. If your data is arranged like neither of the tables above, you may need to learn a bit more R programming to tidy your data in R, or rearrange the data in some other program (like Excel).

My data looks like the "untidy" data above

Testing the assumptions

The independent samples t-test makes several important assumptions. As you review this portion of the tutorial, it is helpful to indicate whether your data passes each assumption. Among other things, this will influence how you write your the final writeup.

(1) Your samples should be randomly selected.

This is vitally important. If your samples are not randomly selected from the population of interest (or at least sampled in a way that approximates random sampling), then you will not be able to draw any conclusions about your population of interest, and your results will be unreliable. If you are dividing participants into a treatment groups, it should be done in a random way.

Despite this, researchers often use t-tests with samples of convenience, because this assumption is hard to meet — for example, a lot of social science research is conducted using undergraduate student volunteers (rather than participants randomly sampled from the general population). In these cases, the groups should be divided randomly. Further, good researchers always identify their sampling methods and, if the sample is not a true random sample, they will report this as a potential weakness of the study.

Are your samples random?
Random samples?

This will be a weakness of your study that will need to be reported if you continue to perform an independent samples t-test. Samples of convenience are often used in social science research, but in these cases, the sampling methods should be reported as a potential weakness of the study.

(2) Your observations should be independent of each other.

This is another way of saying that your samples should be random – observations in your samples should not depend on each other. This is what we refer to as independence of observations. For example, if you plant tomato plants using two different kinds of soil, the height of each plant may depend on the soil used, but it will likely not depend on the height of the other plants.

If you are using a pre-post test design, this violates independence of observations — your post-test scores may depend at least in part on your pretest scores (that is, they are dependent on influenced by other observations). That is, your observations in one sample (the post test scores) depend highly on your observations in another sample (the pretest scores). Time-ordered data usually does not consist of independent observations.

Observations independent?
Independent obs?

Other statistical tests may be useful, but without independence of observations, the independent samples t-test is not right test. It's in the name of the test, after all!

(3) Your samples should be (approximately) normally distributed.

The values of the dependent variable should approximate a normal curve for each sample. This assumption is vital if you have a small sample size. If you have an fairly large sample size (and if your samples have approximately equal variances), this assumption becomes less important, but it's important to report violations of the assumption anyways.

There are three ways to test this assumption in R: a histogram, a Q-Q plot, and a Shapiro-Wilk hypothesis test. We recommend that you do all three. For experienced data scientists, the Q-Q plot can be more sensitive to problematic violations of normality. The Shapiro-Wilk test is easy to report, but is sometimes criticized as arbitrary.


To create a histogram of your data, simply use the following code (for both samples):

hist(my_data$Plant_Height[my_data$Soil_Type == "Soil_A"])

hist(my_data$Plant_Height[my_data$Soil_Type == "Soil_B"])

This will create two histograms, one for each sample in your data. For us, it created a histogram of plant heights for Soil A and then for Soil B, respectively, which looked something like this: 

These plots will show up in the "Plots" tab of your workspace, and you can use the arrow buttons on the tab to cycle through plots that you have made. To check if your data is normal, look and see if the data roughly resembles a "bell curve." If it does, then your data may be normally distributed.

Q-Q Plot

A Q-Q plot is short for a quantiles-quantiles plot. It takes each data point in your data and plots them against what they would be if your data were normally distributed. For example, if you have 50 data points, it calculates 50 "quantiles" from a normal curve, and plots your data against the normal data. If your data is normally distributed, this should result in a roughly straight line. To create a Q-Q plot in R, you will use the R functions qqnorm() and qqline(), as below:

qqnorm(my_data$Plant_Height[my_data$Soil_Type == "Soil_A"])
qqline(my_data$Plant_Height[my_data$Soil_Type == "Soil_A"])

qqnorm(my_data$Plant_Height[my_data$Soil_Type == "Soil_B"])
qqline(my_data$Plant_Height[my_data$Soil_Type == "Soil_B"])

This will create two Q-Q plots, one for each sample in your data. For us, it looked something like this: 

If the data points are roughly in a straight line, this means your data may be normally distributed. How "non-straight" can the line be before you get concerned? This takes some practice and human judgment. We've included below a way to "practice" and see how different distributions look on a Q-Q plot, so that you familiarize yourself with how to tell if your data is probably normally distributed.

Shapiro-Wilk Test

The Shapiro-Wilk test checks for non-normality using a hypothesis test. The Shapiro-Wilk test treats your data as a sample from a normally distirbuted population. The Shapiro-Wilk test asks, "Assuming that the data waspulled from a normally distributed population, what is the likelihood of getting data as non-normal as this?" If the likelihood is large, we might conclude that our population is normally distributed; if the likelihood is small, we might conclude that our sample is fairly non-normal.

We can perform the Shapiro-Wilk test using the code below:

shapiro.test(my_data$Plant_Height[my_data$Soil_Type == "Soil_A"])
shapiro.test(my_data$Plant_Height[my_data$Soil_Type == "Soil_B"])

The test will produce two values: W, and p, which is the probabilty of obtaining the W-score if we assume that our data was pulled from a normal distribution. For us, it looked like this:

If the p-value is less that .05, then we can make an argument that our data is unlikely to have come from a normal distirbution. Note, however, that this can be pretty arbitrary — a p-value of .06 can still pass this test, which means that non-normal data can pass the Shapiro-Wilk test pretty often. For this reason, the Shapiro-Wilk test should be used together with visual tests such as histograms and Q-Q plots.

Detecting Non-normal Data

Below are three different ways to detect non-normal data. The first is a histogram, the second is a Q-Q plot, and the third is the Shapiro-Wilk test. Click on the buttons below to generate random distributions of different kinds (clicking on the same button will generate a new random sample). Observe how the various plots look, and the results of the Shapiro-Wilk test. The purpose of this demonstration is to help you become familiar with the strengths and weaknesses of each approach.

Warning: Each "normal" sample is randomly selected from a normal distribution, and each "skewed" sample is randomly selected from a skewed distribution. This does not guarantee that the actual samples will be normal or skewed, due to random sampling variation. Also, ignore the dot in the upper left corner of the Q-Q plot. I can't seem to get rid of it.

Histogram, N = value
Q-Q Plot
Shapiro-Wilk Test
W = value, p = value Clearly a bug. Working to fix.
Is your data normally distributed?
Normal data?

If your sample size is large, this may not be an issue (but you will still want to report this). If your samples size is small, however, it may be an issue. Here are some ways to deal with this:

  • Resource #1
  • Resource #2

(4) The variances of your two samples should be similar.

If you have one group where the measurements are all alike, and another group where the measurements are all over the chart, the t-test is not as reliable a test to run. This is part because the t-test makes inferences about the variances of the underlying populations based on the variances of your samples. Without homogeneity ("sameness") of variances, the results of your traditional independent samples t-test cannot be trusted.

Levene's F test

This is assumption is most often tested using Levene's F test for equality of variances. Like the Shapiro-Wilk test, this is also a hypothesis test. The Levene's F test is contained in a package called car, which typically comes preinstalled. You'll need to load this library to perform the test. To perform the Levene test, simply use the following code:

leveneTest(Plant_Height ~ Soil_Type, my_data)

This code performs the Levene's F test, which for us looked like this:

If the p-value is less than .05, then we can make an argument that our variances are unlikely to be similar. That's a weird way to say it, but it's also the most precise. We could also say that our variances are likely to be different. Note, however, that this can be pretty arbitrary.


For this reason, many people use with visual tests such as box plots to supplement Levene's F test. to do this, you can use this code:

boxplot(my_data$Plant_Height ~ my_data$Soil_Type)

This produces a box plot for each sample. For us, it looked like this:

Notice that for us, the box plots look fairly similar -- one does not look substantially taller than the other. This won't always be the case. If your data fails this assumption, one plot may have a much "taller" box plot. We've provided a tool below that will display a number of randomly generated data sets to illustrate this, for practice.

Detecting Unequal Variances

Below, we have a boxplot of two variables, and also the results of the Levene test. The purpose of this demonstration is to visually depict the relationship between the variances of the samples and the results of the Levene test. 

Boxplot, N = 300
Levene's Test
F = 1.004, p = 0.317
Are your variances similar?
Similar variances?

When your variances are not equal, there is a variation of the t-test that can be performed (the Welch's independent samples t-test). So don't fear! We'll show you below how to do this. It's important to note this and use the alternative test when necessary. Some argue that the Welch's independent samples t-test can be used even if the variances are equal, so it's pretty reliable.

Performing the test and reporting the results

Performing the t-test

Once you've tested your assumptions, you can perform the t-test. The code for performing the t-test is fairly simple, but depends on whether your samples have similar variances or not.

When your samples have equal or similar variances, as tested by the Levene's test or the box plot test, you can use this code to perform your t.test:

t.test(my_data$Plant_Height ~ my_data$Soil_Type, var.equal = TRUE)

This will run a traditional students t-test, and the results will look something like this:

When your samples have unequal variances, as tested by the Levene's test or the box plot test, you can use this code to perform your t.test:

t.test(my_data$Plant_Height ~ my_data$Soil_Type, var.equal = FALSE)

This will run a Welch's independent samples t-test, and the results will look something like this:

Did it work?
Did it work?

We don't know what to put here -- we're not sure what might go wrong at this point. Send us a message with what went wrong for you, and we'll include some troubleshooting tips here in the future. Be sure to include as much detail as you possibly can. We won't be able to respond to you, but including your problem here can help us make the site better.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form

Interpreting the results

To be perfectly honest, the result of the t-test (t) and the degrees of freedom (df) aren't very important to you. What's important is the p-value, which tells you the probability of getting your result if the true difference in means in your underlying population is zero. In our case, our p-value is less than "2.2e-16", which is the same thing as .00000000000000022. That's very, very small -- it means that if the true difference in means of our underlying population is zero, there's only an infinitesimally small chance of getting our two very different samples by random chance alone.

Your p-value probably won't be that small. a p-value of .20 represents a 20% chance of getting the results from random variation, if the true difference in means is zero. A p-value of .04 represents a 4% chance, and a p-value of .003 represents a 0.3% chance. The smaller the value, the more certain you can be that the difference in samples is not due to chance, but due to a difference in the underlying populations.

At this point, you compare your p-value with the alpha-level of your study. Various disciplines have various conventions for what constitutes a reasonable alpha level, and you should be familiar with yours. In the social sciences, this is usually p < .05. This means that when there is less than a 5% chance of getting our results due to random sampling variation, we will consider the results "statistically significant." That is, the difference between the groups is big enough (in ratio with their respective variances) that we consider it unlikely that the difference is due to chance. We will then make an inference about the underlying populations, concluding that they are likely different. This is why the independent samples t-test is often called an inferential test. 

Further, the confidence interval provided is also interesting. For us, the 95% confidence interval is -18.99 to -14.38. This means that we are 95% confident that the true difference in means between our samples lies between these two values. As long as zero doesn't lie between these values, our results are statistically significant. In other words, we are 95% confident that the true difference of means is something different than zero -- in fact, we are 95% confident that it is quite some distance from zero. 

I don't understand...
Concept check:
Did it work?

Here's some resources that might help:

  • Resource #1
  • Resource #2

Reporting the results

We are going to use APA format. We plan to include more formats in the future, but we just don't have it available at the moment. This section also includes what you might need based on your responses above: if you indicate that your data fails an assumption of the t-test, we indicate where to mention that.

To test whether there is a statistically significant difference between the plant heights of the two samples, we performed an independent samples t-test. The samples were approximately normally distributed, as indicated visually using a histogram and a Q-Q plot, and statistically using the Shapiro-Wilk significance test, W = 0.96, = 0.29 and W = 0.97, = 0.65 for soil type A and soil type B, respectively. The variances of the two samples were very similar, as indicated using box plots and Levene's significance test (F = 0.01, p = .917). There was a significant effect for soil type, t(58) = -14.45, p < .001, with soil type B (M = 29.27, SD = 4.43) resulting in taller plants than soil type A (M = 45.96, SD = 4.52).

Often, when the assumption of the test have been met, they are not mentioned in detail. For example, in many publications in the social sciences, we might just report it this way:

To test whether there is a statistically significant difference between the plant heights of the two samples, we performed an independent samples t-test (after checking to ensure that the assumptions of the test were met). There was a significant effect for soil type, t(58) = -14.45, p < .001, with soil type B (M = 29.27, SD = 4.43) resulting in taller plants than soil type A (M = 45.96, SD = 4.52).

I don't understand...
Concept check:
Did it work?

Here's some resources that might help:

  • Resource #1
  • Resource #2

Practice examples

Here's a couple datasets that you can practice doing independent samples t-tests with. The first one is extremely large, and the second one is fairly small.

Income Demographics

This dataset includes the age, marital status, gender, income level, and other information of a random sample of 50,000 U.S. residents in the year 2000. Here are three questions you can ask:

  • Do men make more than women? 
  • Do those who are born in the USA make more than immigrants?
  • Do men have more years of education than women?

It goes without saying that whatever inferences you make based on the data are applicable only in the year 2000 — things may have changed significantly since then. But this data set offers a few opportunities to practice performing an independent samples t-test. You may need to do some cleaning up and simplifying of the data, since there are more columns than you will need.


Turtle Dimensions

This dataset includes the height, width, length, and gender of 48 turtles. Don't ask us why — we're sure someone had good reason to make these measurements. Here's three questions you can ask:

  • Does gender matter in the height of turtles?
  • Does gender matter in the length of turtles?
  • Does gender matter in the width of turtles?

It goes without saying that whatever inferences you make based on the data cannot be used to draw conclusions about any particular turtle population, since we don't even know what species of turtle this was sampled from. But again, this data set offers a few opportunities to practice performing an independent samples t-test.