Sampling bias and how to avoid it

Learn more about sampling bias and why it’s a common problem. Avoid it completely by using SurveyMonkey Audience.

Sampling bias is a known issue. It occurs in studies performed by those new to research as well as seasoned researchers. It is important to understand what sampling bias is and how it happens in order to avoid it in your research efforts. Today, we’ll explain what sampling bias is and help you prevent it in your market research to ensure honest, accurate results.

What is sampling bias?

Sampling bias is a type of survey bias that occurs when a research study does not use a representative sample of a target population. In other words, you gather data from a group in which some members of the intended population have a higher or lower sampling probability than others.

This unbalanced sample can affect the validity of the research data and results, and it can also limit the extent to which conclusions can be generalised to a larger population.

Sampling bias usually happens unintentionally and is commonly caused by using convenience or purposive sampling strategies.

Common causes of sampling bias

There are two common causes of sampling bias:

Poor methodology: the most accurate sampling method is simple random sampling. This method allows for a large number of respondents who are chosen completely at random. When other parameters are set, researchers may unintentionally risk inserting their own selection bias in the process of choosing respondents.
Poor execution: this occurs when the researcher has set out an accurate methodology but those implementing the sampling cut corners. If they resort to convenience sampling or don’t follow up with those participants who do not respond, they abandon the carefully formulated methodology and risk invalid results due to sampling bias.

A famous example of sampling bias occurred in the 1948 US presidential election. A telephone survey was conducted during the presidential race and the results implied a landslide win for Thomas E. Dewey over Harry S. Truman. The researchers did not take into account the fact that telephony was a new science and those who could afford telephones were wealthy. The researchers did not make an effort to survey citizens in the lower-middle or lower classes who were more likely to vote for Truman.

Because the sample was not representative of the entire US population, the results were inaccurate and failed to predict the winner of the 1948 presidential race (Truman). However, the front page of the Chicago Tribune trusted the survey results for their early edition and ran the incorrect headline the next morning, “Dewey Defeats Truman”. This was an embarrassing lesson for the Tribune to learn about sampling bias.

Types of sampling bias

There are several types of sampling bias. Let’s look at some of the most common types:

Undercoverage

This is also called exclusion bias and occurs when a portion of the population of interest is not accurately represented in the sample. This was the case in our presidential election example above. US citizens who did not own telephones were excluded from the sample. In today’s world, a similar scenario could take place if a national internet survey was conducted and researchers did not find a way to include the elderly and those with limited or no internet access.

Another undercoverage bias could occur if convenience sampling was used. Convenience sampling utilises participants who are easy to reach. For example, you may have seen people conducting surveys in high-traffic areas in a large city. Those surveys will probably be subject to undercoverage of people who don’t live in the city or drive instead of walk.

Self-selection/voluntary response bias

This type of bias takes place when respondents with specific characteristics are more willing to take part in research. In this case, participants volunteer to participate in the study. People who volunteer are more likely to have an opinion on the topic being studied. Conversely, some people will not volunteer to participate because they prefer not to discuss the topic. This leaves the sample with an abundance of people with strong opinions and not enough people who don’t have strong feelings or don’t want to discuss the topic at hand.

An example of self-selection bias would be a product evaluation survey where participants can choose to participate. Those who have had a strong emotional experience (either positive or negative) are more likely to participate in the study. This skews data by excluding a full range of customer experiences. You can see this bias in effect in customer reviews.

Survivorship bias

In survivorship bias, the sample is focused on those who pass the selection criteria. Those who do not pass are ignored and therefore underrepresented.

For example, if your survey only includes current customers, their feedback is more likely to be positively skewed than if you included those who have stopped shopping with you. They have chosen to continue a relationship with your brand, so it's likely that they are feeling positive about their experiences. Customers who no longer purchase your products will have different insights that should be included in your survey for accuracy.

Non-response bias

Non-response, or participation bias, occurs when a group of respondents refuses to participate in a study or drops out during the study period. This could be due to the length of the survey, the structure of the questions or sensitive topics at hand.

Frequently, non-response bias occurs because people do not feel comfortable providing information regarding income, gender identity, age, marital status and other personal details. Other reasons for non-response issues include lack of interest, lack of time or simply not wanting to share their feelings about the topic.

An example of non-response bias could be a study into drug use. Questions about how frequently certain drugs are used or which drugs are used most often may cause participants to drop out if they are embarrassed to talk about the subject or afraid that they will be exposed as engaging in illegal drug use.

Recall bias

Memory is imperfect, and when your survey participants can’t remember correctly, it results in recall bias. You may be able to reduce recall bias by collecting responses soon after the occurrence you are studying. However, in many cases, you simply can’t do anything to mitigate recall bias. Certain respondents may not recall certain experiences as well as others.

For example, if you are studying risk factors for a specific kind of disease, people who have had the disease are more likely to recall – and make more effort to remember – than respondents who have been unaffected by the disease.

Observer bias

When researchers consciously or subconsciously influence the interpretation of the data, it results in observer bias. This may take the form of focusing only on a certain dataset or influencing participants during data collection.

For instance, researchers are sometimes present during participants' interviews. If a researcher inadvertently displays enthusiasm for a certain type of response, the participants may notice and change their responses to please the researcher. In another example, a researcher makes unintentionally erroneous interpretations of data so that the study results fit their hypothesis or expectation.

Exclusion bias

Exclusion bias is the result of intentionally excluding specific subgroups from your study. This affects the validity of the study.

For example, excluding a group that has recently moved into the study area could potentially result in false connections between research variables, which could have an impact on your research outcomes.

Healthy user bias

This type of sampling bias is most frequently observed in medical studies. Healthy user bias involves a higher focus on participants who are more active, healthy and fit than most of the general population. People who are not healthy enough to participate are omitted.

For example, in a drug trial for a cholesterol-lowering medication, the effectiveness may be misrepresented due to factors other than the effects of the medication (the health of the individuals in the study), making the medication appear to be more beneficial than it really is.

Berkson’s Fallacy

Berkson’s Fallacy is the opposite of healthy user bias. It involves researchers only studying participants who are very ill, causing an underrepresentation of healthy people. This results in a false finding of a correlation between variables.

For example, in 1946, Joseph Berkson studied his patients in the hospital and found that there was a perceived association between diabetes and gallbladder disease. Even though the diseases were independent, most of his patients believed they were related. Berkson’s conclusion about his misleading correlation was that people who are hospitalised are more likely to have many diseases. Information was only collected from inpatients, so they incorrectly correlated the two diseases.

How to avoid sampling bias

To avoid sampling bias, you need to look carefully at your survey methodology and design. Clearly define your survey goals and define your target audience. Ensure that your process allows an equal opportunity for each member of the target population to be part of your sample group.

And to reassure your participants, always include a statement at the beginning of your survey assuring participants that their answers will be anonymous and only used for the purposes of your study.

Here’s an example statement: This survey is anonymous. No one will be able to identify you or your answers and no one will know whether or not you participated in the study.

Let’s look at some additional ways to avoid survey sampling bias:

Avoid convenience sampling

Clearly define the groups in your target study population and then make sure sufficient data is collected from each group. Provide training to those conducting your study to prevent them from resorting to convenience sampling.

A simple way to avoid convenience sampling is to use SurveyMonkey Audience to reach your target population. You can pick your target audience, send your survey, collect feedback and analyse your results.

Follow up with non-responders

Why are people not responding to your survey? Follow up with non-responders to find out whether you’re asking the wrong questions, requesting the wrong information or targeting the wrong audience. Use the follow-up information to gain actionable insights for your next study.

Ensure your survey is simple and accessible

Create a survey that is brief and easy to understand. Survey studies with complicated queries or too many questions lead to lower survey completion rates.

Define your target audience

Clearly define your target audience, parameters for sample selection and the sampling frame of your study to ensure that relevant, accurate data can be collected.

Set clear objectives

Establish what you want to accomplish with your survey first. With that in mind, you can determine which sample methodology and survey structure will work best. You’ll have a better understanding of who should participate, the necessary sample size and how to communicate with your target respondents.

Use random or stratified sampling

There are two sampling methods that are guaranteed to keep your study free of sampling bias: simple random sampling and stratified random sampling.

Simple random sampling

In this sampling method, participants are chosen completely at random. There are equal odds of any individual member of the target population being selected for the study. This is easily accomplished using an Excel spreadsheet with the formula “=RAND()” in every row of your master list of participants. This will produce a random decimal value for each participant and you can select any continuous group in the list (e.g. the top 100 or the bottom 100). This method is particularly useful in large studies.

Stratified random sampling

In stratified random sampling, researchers examine the population they are studying and put together an accurate representative sample. For example, 1,000 people are in the target population of estate agents and 10 individuals are required for the study. There are 500 female agents and 500 male agents in the population, so the researcher should ensure that the sample includes five female and five male agents for the study.

Both of these methods are effective in terms of reducing the risk of sampling bias occurring.

Avoid sampling bias in your next survey

The first step to take to avoid sampling bias is to understand what it is, what causes it and the types of sampling bias. Armed with this information, you can use our tips for avoiding sampling bias and keep your study results valid and accurate.

Remember, even the most experienced research professionals can inadvertently commit sampling bias. Refer back to this article and double-check your methodology to ensure that your study is free of bias.

The best way to eliminate the risk of sampling bias occurring is to use SurveyMonkey Audience. You’ll receive responses from your ideal audience for high-quality, bias-free data. Find out more about our survey response tool today!

Get started with your market research

Global survey panel

Collect market research data by sending your survey to a representative sample

Get started

Research services

Get help with your market research project by working with our expert research team

Learn more

Expert solutions

Test creative or product concepts using an automated approach to analysis and reporting

Learn more

To read more market research resources, visit our Sitemap.