Sampling
Published:
Sampling is the activity of choosing or selecting a subset of individuals from within a statistical population in order to estimate characteristics of the whole population. Each observation measures one or more properties (such as weight, location, color) of observable bodies distinguished as independent objects or individuals.
In survey sampling, weights can be applied to the data to adjust for the sample design, particularly stratified sampling. Results from probability theory and statistical theory are employed to guide the practice.
The sampling process comprises several stages:
- Defining the population of concern
- Specifying a sampling frame, a set of items or events possible to measure
- Specifying a sampling method for selecting items or events from the frame
- Determining the sample size
- Implementing the sampling plan
- Sampling and data collecting
- Data which can be selected
Types of sampling
The possible types of sampling regarding probabilistic philosophy are:
- A probability sampling: is any sampling method selected using some random method and we can assign to each element of the sample a probability to be sampled.
- Nonprobability sampling: is any sampling method where some elements of the population have no chance of selection, or where the probability of selection can’t be accurately determined. It involves the selection of elements based on assumptions regarding the population of interest, which forms the criteria for selection. Hence, because the selection of elements is nonrandom, nonprobability sampling does not allow the estimation of sampling errors. These conditions give rise to exclusion bias, placing limits on how much information a sample can provide about the population. Information about the relationship between sample and population is limited, making it difficult to extrapolate from the sample to the population.
In this section there are different categorization of the possible sampling methods. The main factors considered are:
- Nature and quality of the frame
- Availability of auxiliary information about units on the frame
- Accuracy requirements, and the need to measure accuracy
- Whether detailed analysis of the sample is expected
- Cost/operational concerns
Most common methods:
- Simple random sampling: All such subsets of the frame are given an equal probability, any pair of subsets have the same probability than any other random pair. This minimises bias and simplifies analysis of results. The variance between individual results within the sample is a good indicator of variance in the overall population, which makes it relatively easy to estimate the accuracy of results. SRS can be vulnerable to sampling error because the randomness of the selection may result in a sample that doesn’t reflect the makeup of the population.
- Systematic sampling: Systematic sampling (also known as interval sampling) relies on arranging the study population according to some ordering scheme and then selecting elements at regular intervals through that ordered list. Systematic sampling involves a random start and then proceeds with the selection of every kth element from then onwards. As long as the starting point is randomized, systematic sampling is a type of probability sampling. It is easy to implement and the stratification induced can make it efficient, if the variable by which the list is ordered is correlated with the variable of interest. However, systematic sampling is especially vulnerable to periodicities in the list. Another drawback is that theoretical properties make it difficult to quantify that accuracy.
- Stratified sampling: stratified sampling is a method of sampling from a population where the samples conserves the strata representation. Population embraces a number of distinct categories, the frame can be organized by these categories into separate “strata.” Each stratum is then sampled as an independent sub-population, out of which individual elements can be randomly selected. Stratified sampling is a method of variance reduction when Monte Carlo methods are used to estimate population statistics from a known population.
- Probability-proportional-to-size sampling: Probability proportional to size (‘PPS’) sampling, in which the selection probability for each element is set to be proportional to its size measure, up to a maximum of 1. In a simple PPS design, these selection probabilities can then be used as the basis for Poisson sampling. However, this has the drawback of variable sample size, and different portions of the population may still be over- or under-represented due to chance variation in selections. The PPS approach can improve accuracy for a given sample size by concentrating sample on large elements that have the greatest impact on population estimates.
- Cluster sampling: is based on sampling clusters first in a cluster-level sampling frame and later use a element-level sampling. Sometimes it is more cost-effective to select respondents in groups (‘clusters’). Useful in e.g. survey sampling (to only travel to one place) or distributed big data. Cluster sampling (also known as clustered sampling) generally increases the variability of sample estimates above that of simple random sampling, depending on how the clusters differ between themselves, as compared with the within-cluster variation. For this reason, cluster sampling requires a larger sample than SRS to achieve the same level of accuracy - but cost savings from clustering might still make this a cheaper option. Cluster sampling is commonly implemented as multistage sampling. Multistage sampling can substantially reduce sampling costs, where the complete population list would need to be constructed (before other sampling methods could be applied). By eliminating the work involved in describing clusters that are not selected, multistage sampling can reduce the large costs associated with traditional cluster sampling. However, each sample may not be a full representative of the whole population.
Other common methods:
- Quota sampling: he population is first segmented into mutually exclusive sub-groups, just as in stratified sampling. Then judgement is used to select the subjects or units from each segment based on a specified proportion.
- Minimax sampling:
- Accidental sampling: a type of nonprobability sampling which involves the sample being drawn from that part of the population which is close to hand.
- Line-intercept sampling: is a method of sampling elements in a region whereby an element is sampled if a chosen line segment, called a “transect”, intersects the element.
- Panel sampling: longitudinal-sampling method.
- Snowball sampling: involves finding a small group of initial respondents and using them to recruit more respondents. It is particularly useful in cases where the population is hidden or difficult to enumerate.
- Theoretical sampling: occurs when samples are selected on the basis of the results of the data collected so far with a goal of developing a deeper understanding of the area or develop theory.
See also
Papers
- Altmann, J. (1974). Observational study of behavior: sampling methods. Behaviour, 49(3), 227-266.
- Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97-109.
- Candès, E. J., & Wakin, M. B. (2008). An introduction to compressive sampling. IEEE signal processing magazine, 25(2), 21-30.
Books
- Lohr, S. (2009). Sampling: design and analysis. Nelson Education.
- Cochran, W. G. (2007). Sampling techniques. John Wiley & Sons.
- Lehmann, E. L. (1999). Elements of large-sample theory. Springer Science & Business Media.
- Levy, P. S., & Lemeshow, S. (2013). Sampling of populations: methods and applications. John Wiley & Sons.