Sample means comparisons are a fundamental and ubiquitous approach to interpreting experimental psychological data. Yet, we argue that the sample and effect sizes in published psychological research are frequently so small that sample means are insufficiently accurate to determine whether treatment effects have occurred. Generally, an estimator should be more accurate than any benchmark that systematically ignores information about the relations among experimental conditions. We consider two such benchmark estimators: one that randomizes the relations among conditions and another that always assumes no treatment effects. We show conditions under which these benchmark estimators estimate the true parameters more accurately than sample means. This perverse situation can occur even when effects are statistically significant at traditional levels. Our argument motivates the need for regularized estimates, such as those used in lasso, ridge, and hierarchical Bayes techniques.