Learning Objectives
By the end of this section, you will be able to:
- Understand the logic of sampling
- Differentiate between samples and population size
- Identify the difference between probabilistic and non-probabilistic sampling
While we have provided you with major designs, it is important to be able to understand additional components of research design. Understanding these components will help you to build on existing designs so you can create blueprints that are specific to your research. Sampling is an important component to consider because it can be difficult to obtain data on every single case in the population. How your sample is created and who is part of your sample have implications for the conclusions you can make about your results.
An important component of research design is determining who will be part of the study, the number of cases, and how cases will be selected into the study. The first step is to determine what population you are interested in. The population refers to all cases that could be a part of the study. For instance, if you are interested in why people vote for certain candidates, the population of interest is all adults who are 18 years or older and are registered to vote.
It would not make sense for your population to include those who are not registered to vote because your question is specific to voter behavior. A case would be a single unit of the population identified, or an adult who is 18 years or older and is registered to vote. If everyone who is part of this population could also be a part of the study, the evidence for the theory put forward would be quite convincing; however, this would be difficult to obtain. Not only would it be very costly, it might not necessarily be feasible due to time constraints because there are more than 130 million voters in the United States. While there is a temptation to try to include every possible case, one thing to consider is that this is still just a snapshot in time. What do we mean by snapshot in time? All cases might be included for one election year but there are several elections a year along with many years! In the end, the population might just really be a sample in the context of time.
The next step would be to try to figure out the number of cases to include that will still provide a convincing argument to support the theory. According to the law of large numbers, we do not necessarily need to include every single case to provide a convincing argument. Rather than the entire population, the study will likely be based on a sample. We need to provide a sample, or a selection of cases from the population, that is large enough that we can approximate the population values we are seeking. The law of large numbers tells us that when we provide a large enough sample that is also representative of the population, it will lead to the results that are close to the results if we collected data on all the cases in the population.
When sampling, another characteristic that we may be looking for is representativeness. It can be argued that the value of the sample is only meaningful in that it can help us draw conclusions about the population we want to know more about. To figure out representativeness, we need a sampling frame. The complete sampling frame is a list of all those in the population. This list might contain information about the characteristics of the population we are interested in. For our sample to provide us with results that can tell us about the population, we need the sample to be representative, or to be similar to that of the population. If you are interested in learning about voters in the United States, only including voters from California will not be very helpful. This sample can provide you with information about voters in California, but not necessarily about voters in the United States. To ensure representativeness, you can select from the sampling frame who it is that should be included in your sample.
There are two ways to sample cases, one of which, if done properly, will produce representative samples and one that will not reflect representativeness. Probability sampling will produce samples that are more likely to be representative of the population as opposed to nonprobability sampling. Probability sampling requires the use of random selection to place cases into a sample. Examples of probability sampling are simple random sampling, stratified sampling and clustered sampling. Nonprobability sampling uses nonrandom processes to select cases to be part of the sample. Examples of nonprobability sampling include convenience sampling, quota sampling, and snowball sampling.
Probability Sampling
Simple random sampling is argued to be the best approach in selecting a sample. In a simple random sample, each case has an equal chance of being selected to be part of the study. Through simple random sampling, your sample is much more likely to be reflective of your population. A simple way to think of random sampling is putting names in a hat and drawing names out of a hat. This means that if you were interested in studying political science students and there were 1,000 political science students, each student would have a 1 in 1,000 chance of being chosen to participate in your study.
Stratified sampling is similar to random sampling but there may exist a concern over what the sample looks like. There may be a concern about the inclusion or exclusion of certain characteristics. To ensure proportional representation, or ensuring the sample has similar characteristics to that of those in the population, stratified sampling will take into consideration such characteristics and ensure the sample looks like the population. Therefore, we need to know these characteristics relative to the population before selecting the sample.
For instance, if not having enough people who are racially representative of the population is a concern, when sampling you will ensure that twenty percent of the sample is African American and twenty percent of the sample identifies as Latinx because that is the proportion they make up in the population of interest. This is known as a proportionate stratified sample. A disproportionate stratified sample oversamples certain groups that otherwise make up a smaller portion of the sample. Oversampling allows researchers to provide greater insight into these groups and might not be able to do so if few are part of the sample.
A clustered sample takes into consideration that a simple random sample may not be feasible because the population may be quite dispersed. If your population is all U.S. adults who are registered to vote, it might be difficult to acquire a list of every registered voter and then randomly select individuals to be part of your survey. If administered in person, imagine how difficult it would be to fly from one part of the country to the next all for an interview! Instead, the researcher will narrow it down by selecting areas or clusters and then randomly sampling from these areas. For example, a researcher may randomly select states and from within those states, select counties, then cities, and then precincts. Once precincts have been randomly selected, all those who are in those precincts will be measured.
Nonprobability Sample
While random sampling was noted earlier as the ideal way to create a sample, nonprobability samples also serve a purpose. Nonprobability sampling may be chosen due to the small number of cases available. Nonprobability sampling includes convenience, quota, and snowball sampling. Convenience sampling refers to selecting cases that are available. It is almost like not sampling at all because there are no criteria to be part of the sample other than being part of the selected population and a willingness to be part of the sample. An example of selecting cases to be part of a convenience sample is asking individuals who are walking out of a polling place to answer questions.
Quota sampling refers to selecting cases according to a quota or a set number of cases. Researchers may set a fixed number and go about creating a sample that will meet that number. Quota sampling can also be similar to stratified sampling when the researcher is trying to ensure the sample looks similar to the population, meaning that those in the sample are similar in characteristic to the population.
In a snowball sample, initial cases are identified to be a part of the sample. It can be one case, or it can be more. These initial individuals will then provide you with referrals of other individuals who could be a part of the sample. Eventually, the number of cases you have will increase through referrals of individuals who you are able to get to be part of the sample. The sample size will pick up momentum as you are able to accumulate more referrals, gaining more mass and picking up more cases along the way. Utilizing this sampling method is especially useful when working with a hard-to-reach population. For instance, if you were to understand the circumstances in which individuals become homeless, a snowball sample would be helpful especially because a list of homeless individuals does not exist.
Probability and nonprobability sampling are methods for choosing cases to be part of a study. We generally utilize samples because trying to collect information from the population can be difficult. The law of large numbers tells us that we do not necessarily need to include every case to provide us with the data we are looking for when the size of our sample is sufficiently large enough. In creating our sample, there are additional rules of thumb to follow. One general rule is that if the population being studied is small--equal to or less than 100--the best strategy is to include all the cases. Another general rule is to always aim for a larger sample because nonresponse, or not receiving a reply from a case, is a likely possibility.