What is a sample distribution

Sampling errors and sample distribution

Next page:Speculation interval Upwards:Sample distributions Previous page:Number of possible random samples & nbsp index

If you happen to If you select people from the population and determine their average age, there is no guarantee that this value corresponds to the average age of the population as a whole. The "arithmetic mean" in this (and the other possible) "random samples" can randomly deviate from the mean value of the population. These deviations from Population parameters is referred to as Sampling error (sampling error). If you take all possible random samples on the scale viewed and noted the arithmetic mean, you get an overview of the distribution of the arithmetic mean, which can also be displayed graphically. This representation gives an impression of how much the arithmetic mean can fluctuate in samples, i.e. how large the sampling error is. The spread of the arithmetic means over the various samples is measured using the "standard error". One could proceed in a similar way with other sample statistics: e.g. with the standard deviation or the median. Here, too, deviations of the value in the sample from the corresponding value in the population can be expected. The distribution of a sample statistic over all possible samples is called Sample distribution (sampling distribution).

If the distribution of a characteristic in the population is known, then one can use a suitable sample distribution to consider how certain statistics of this characteristic are in samples of a given size to distribute. This conclusion from a known population to a sample is also known as Closure of inclusion (direct conclusion). However, one cannot make reliable forecasts, only statements that are expected to occur with a certain probability. A so-called "guess interval" is therefore used, in which the sample statistics are expected with a certain probability.

In order to be able to calculate the expectation interval, all parameters of the characteristic must be known in the population: in the case of continuous variables, central position and dispersion, in the case of categorical variables the proportion of individual characteristics. The distributionshape (only important in the case of continuous characteristics), with sufficiently large samples (rule of thumb: ) with recourse to the central limit theorem be ignored. Finally, you need information about the "standard error" of the sample statistics. Which distribution model specifically describes the distribution of the sample statistics depends on the respective application. For frequencies or proportions of categorical variables, depending on the »selection technique«, either the »binomial« or the »hypergeometric distribution« is the appropriate sample distribution. The "normal distribution" is used as the sample distribution for the arithmetic mean of continuous variables. With very small samples () but this is only an approximation. In addition, for selection rates above 5%, a Correction factor for finite ensembles necessary (see »Selection Technique«).

Next page:Speculation interval Upwards:Sample distributions Previous page:Number of possible random samples & nbsp index HJA 2001-10-01