P Value Calculator
Calculate p-value from test statistic for z-test, t-test, chi-square, and F-test. One- or
What Is a P-Value?
A p-value is the probability of obtaining results at least as extreme as the observed data, assuming the null hypothesis is true. If you test whether a coin is fair and get 60 heads out of 100 flips, the p-value tells you how likely it is to get 60 or more heads with a truly fair coin. A small p-value (typically below 0.05) suggests the observed result is unlikely under the null hypothesis, providing evidence to reject it. Enter your test statistic in the calculator above to find the p-value for z-tests, t-tests, chi-square tests, and F-tests.
How to Interpret P-Values?
A p-value of 0.03 means there is a 3% probability of seeing results this extreme if the null hypothesis were true. Since 3% is below the conventional 5% threshold (significance level alpha = 0.05), you would reject the null hypothesis. A p-value of 0.15 means 15% probability, which is above 5%, so you would not reject the null hypothesis. Critically, "not rejecting" is not the same as "accepting" the null hypothesis. It means the evidence is insufficient to conclude an effect exists, not that no effect exists. The p-value does not tell you the probability that the null hypothesis is true.
Significance Levels (Alpha)
The significance level alpha is the threshold below which p-values are considered statistically significant. The most common choices are 0.05 (5%), 0.01 (1%), and 0.001 (0.1%). Fields with high stakes (particle physics, genome-wide association studies) use stricter thresholds to reduce false positives. The choice of alpha should be made before collecting data, not after seeing results. If the p-value is 0.04, declaring alpha = 0.05 after the fact (to make the result significant) is p-hacking, a form of scientific misconduct. Different disciplines have different conventions: psychology typically uses 0.05, while particle physics requires about 0.0000003 (5-sigma) to claim a discovery.
One-Tailed vs Two-Tailed Tests
A two-tailed test checks whether a parameter is different from the null value in either direction. A one-tailed test checks only one direction (greater than or less than). If you test whether a new drug is different from placebo, use two-tailed. If you test whether it is better than placebo (only interested in improvement), use one-tailed. One-tailed p-values are exactly half of two-tailed p-values for the same test statistic. The choice between one and two tails should be decided before data collection based on the research question, not chosen afterward to achieve significance.
Common Statistical Tests and Their P-Values
The z-test compares a sample mean to a population mean when the population standard deviation is known. The t-test does the same when it is estimated from the sample (most common in practice). The chi-square test compares observed frequencies to expected frequencies (goodness of fit, independence). The F-test compares variances between groups or overall model significance in regression (ANOVA). Each test produces a test statistic that the calculator converts to a p-value using the appropriate probability distribution.
P-Value Misconceptions
The p-value is not the probability that the null hypothesis is true. It is the probability of the data given the null hypothesis, not the probability of the hypothesis given the data. A statistically significant result (p less than 0.05) does not mean the effect is practically important. A drug that lowers blood pressure by 0.1 mmHg might achieve p = 0.001 with a large enough sample, but the effect size is clinically meaningless. Conversely, a non-significant result does not prove no effect exists. Small sample sizes produce large p-values even when real effects are present. Effect size and confidence intervals provide information that p-values alone cannot.
The Replication Crisis and P-Values
Many scientific fields have experienced a replication crisis where published findings with p less than 0.05 fail to replicate in subsequent studies. Contributing factors include publication bias (journals prefer significant results), p-hacking (testing many hypotheses until one reaches significance), small sample sizes, and misunderstanding what p-values actually mean. In response, many journals now require pre-registration of hypotheses, reporting of effect sizes and confidence intervals alongside p-values, and some have moved away from strict significance thresholds entirely. The American Statistical Association released a statement in 2016 emphasizing that p-values should not be used as the sole basis for scientific conclusions.
Frequently asked questions
What is a p-value?
What does p < 0.05 mean?
What is the difference between one-tailed and two-tailed?
Does p < 0.05 prove my hypothesis?
What is p-hacking?
What should I report alongside p-values?
Rate This Calculator
Your feedback helps us improve our tools