Survey analysis tool

Most statistical analyses of surveys, like opinion polls or market researches, are based on a not-true-in-all-cases assumption that a sample of 1000 satisfies the error margin of ±3% at 95% confidence level. Although the assumption is valid in most cases, it may be of vital importance to pollsters to know in which cases the assumption is wrong, and by how much. Furthermore, a serious pollster should be aware of the fact that the minimum error margin and the maximum confidence level can not be chosen both in advance: either one of the two can be chosen before the survey, while the other one can be only computed after the outcome of the survey.

This page, however, is dedicated to the analysis of the survey results by means of discrete hypergeometric distribution. The limitation the user may face is the time required to compute the results. Although one of the fastest algorithms is used here, the computations become time consuming (order of tens of seconds) for the values of population size N above ten million. The upper limit for the computational time on this server is 30 seconds per case.

Input

sample size: n =
sampling result: m =
population size: N =
confidence level in %: r =

Output

case Mmin Mmax pmin pmax - perc. + perc. elapsed
realistic 0 0 0.00% 0.00% +0.00% +0.00% 0.000 s
symmetric 0 0 0.00% 0.00% +0.00% +0.00% 0.000 s
optimistic 0 0 0.00% 0.00% +0.00% +0.00% 0.000 s
pessimistic 0 0 0.00% 0.00% +0.00% +0.00% 0.000 s
Gaussian 0 0 0.00% 0.00% +0.00% +0.00% 0.000 s



Description of input data

types: n, m, N = integers, r = float
requirements: m≥0, m≤n≤N, 0≤r≤100%
test analysis: n=500, m=100, N=123456, r=99%

Description of output data

case: Five different confidence intervals that can be claimed with the prescribed confidence level: realistic (tails cut-off), symmetric (+/- percentage with respect to expected value mN/n), optimistic/pessimistic (extremes of biased interpretations), and approximation with Gaussian distribution.
Mmin, Mmax: Lower and upper value of confidence interval according to the case.
pmin, pmax: Confidence interval in fractions of population size, i.e. values Mmin and Mmax divided by the population size N.
+/- perc.: Deviations of Mmin and Mmax values from the expected value mN/n in percents.

Examples

Opinion poll analysis
Imagine 700 people have been randomly chosen and asked about their preferences among the candidates A, B, and C. The candidate A gets 20% of the votes. What can we deduce from the poll results, if the population size is 1.6 million, and we want to be 99% sure? We feed the tool with the input data n=700, m=140, N=1600000, and r=99%. The output gives five cases. Let us consider the realistic case: there is 99% probability that candidate A has between 261744 and 386102 votes, which is between 16.36% and 24.13% of the total population. Deviations from the expected value of 20% are -3.64% and +4.13%.

Prevalence of a disease
Around Y2K an outbreak of mad cow disease occurs. The authorities in many countries want to know what portion of their animals is infected. Let us assume that in the population of 200000 animals they test a random sample of 3000, and find zero positive cases. The input data are n=3000, m=0, and N=200000, for confidence level we choose 95%. The authorities are interested in the pessimistic case of the results: the lowest estimation with 95% probability. From the output table we get Mmin=0 and Mmax=198, which means that not more than 198 animals are infected within the whole population.