Parametric and nonparametric statistics

Published: June 01, 2016

Parametric statistics is a branch of statistics which assumes that sample data comes from a population that follows a probability distribution based on a fixed set of parameters. Most well-known elementary statistical methods are parametric. Conversely a non-parametric model differs precisely in that the parameter set (or feature set in machine learning) is not fixed and can increase, or even decrease if new relevant information is collected.

A parametric model as it relies on a fixed parameter set assumes more about a given population than non-parametric methods. When the assumptions are correct, parametric methods will produce more accurate and precise estimates than non-parametric methods, i.e. have more statistical power. As more is assumed when the assumptions are not correct they have a greater chance of failing, and for this reason are not a robust statistical method. On the other hand, parametric formulae are often simpler to write down and faster to compute. For this reason their simplicity can make up for their lack of robustness, especially if care is taken to examine diagnostic statistics.

Nonparametric statistics are statistics not based on parameterized families of probability distributions. They include both descriptive and inferential statistics. The typical parameters are the mean, variance, etc. Unlike parametric statistics, nonparametric statistics make no assumptions about the probability distributions of the variables being assessed. The difference between parametric models and non-parametric models is that the former has a fixed number of parameters, while the latter grows the number of parameters with the amount of training data. Note that the non-parametric model does not have any parameters: parameters are determined by the training data, not the model.

Properties of non-parametric testing in comparison with parametric testing.

Wider range of application.
More robust.
Simplicity (model structure is not specified a priori but is instead determined from data).
Larger sample size can be required to draw conclusions with the same degree of confidence.
Less powerful than the appliable parametric test (if it exists)

Applications

Applications of non-parametric methods:

Studying populations that take on a ranked order (such as movie reviews receiving one to four stars)
The use of non-parametric methods may be necessary when data have a ranking but no clear numerical interpretation, such as when assessing preferences. In terms of levels of measurement, non-parametric methods result in “ordinal” data.
Situations where less is known about the application in question

Models

Non-parametric models:

A histogram is a simple nonparametric estimate of a probability distribution.
Kernel density estimation provides better estimates of the density than histograms.
Nonparametric regression and semiparametric regression methods have been developed based on kernels, splines, and wavelets.
Data envelopment analysis provides efficiency coefficients similar to those obtained by multivariate analysis without any distributional assumption.
KNNs classify the unseen instance based on the K points in the training set which are nearest to it.
A support vector machine (with a Gaussian kernel) is a nonparametric large-margin classifier.

Methods

The most knwon non-parametric methods are:

Analysis of similarities
Anderson-Darling test: tests whether a sample is drawn from a given distribution
Statistical bootstrap methods: estimates the accuracy/sampling distribution of a statistic
Cochran’s Q: tests whether k treatments in randomized block designs with 0/1 outcomes have identical effects
Cohen’s kappa: measures inter-rater agreement for categorical items
Friedman two-way analysis of variance by ranks: tests whether k treatments in randomized block designs have identical effects
Kaplan-Meier: estimates the survival function from lifetime data, modeling censoring
Kendall’s tau: measures statistical dependence between two variables
Kendall’s W: a measure between 0 and 1 of inter-rater agreement
Kolmogorov-Smirnov test: tests whether a sample is drawn from a given distribution, or whether two samples are drawn from the same distribution
Kruskal-Wallis one-way analysis of variance by ranks: tests whether > 2 independent samples are drawn from the same distribution
Kuiper’s test: tests whether a sample is drawn from a given distribution, sensitive to cyclic variations such as day of the week
Logrank test: compares survival distributions of two right-skewed, censored samples
Mann-Whitney U or Wilcoxon rank sum test: tests whether two samples are drawn from the same distribution, as compared to a given alternative hypothesis.
McNemar’s test: tests whether, in 2 × 2 contingency tables with a dichotomous trait and matched pairs of subjects, row and column marginal frequencies are equal
Median test: tests whether two samples are drawn from distributions with equal medians
Pitman’s permutation test: a statistical significance test that yields exact p values by examining all possible rearrangements of labels
Rank products: detects differentially expressed genes in replicated microarray experiments
Siegel-Tukey test: tests for differences in scale between two groups
Sign test: tests whether matched pair samples are drawn from distributions with equal medians
Spearman’s rank correlation coefficient: measures statistical dependence between two variables using a monotonic function
Squared ranks test: tests equality of variances in two or more samples
Tukey-Duckworth test: tests equality of two distributions by using ranks
Wald-Wolfowitz runs test: tests whether the elements of a sequence are mutually independent/random
Wilcoxon signed-rank test: tests whether matched pair samples are drawn from populations with different mean ranks

Share on

Twitter Facebook Xing LinkedIn Telegram Whatsapp

Parametric and nonparametric statistics

Applications

Models

Methods

See also

Share on

You May Also Enjoy

¿Para qué sirven los modelos?

La importancia de como mirar los datos (I): Introducción

Futur de la llar a mig termini

Chatbot: oportunidades e a miña propia proposición