Abstract :
Data reduction methods are called for to address challenges presented by big data. Correlation of two variables may be less clear if data are organised at disaggregate levels in regression analysis. In this study, we apply data aggregation to regression analysis in the context of a study forecasting the impact of computerisation on jobs and wages. We show that data grouped by the ranked independent variable, versus random or other grouping schemes, provides a clearer pattern of the employment impacts of computerisation probability on job categories. The coefficient estimates are more consistent for groupings based on a ranked independent variable, than those provided by random grouping of the same independent variable. The improved estimations can have positive policy implications. Byline: James Otto, Chaodong Han