Feedbacks between the climate system and the carbon cycle represent a key source of uncertainty in model projections of Earth's climate, in part due to our inability to directly measure large-scale biosphere-atmosphere carbon fluxes. In situ measurements of the CO.sub.2 mole fraction from surface flasks, towers, and aircraft are used in inverse models to infer fluxes, but measurement networks remain sparse, with limited or no coverage over large parts of the planet. Satellite retrievals of total column CO.sub.2 (XCO2), such as those from NASA's Orbiting Carbon Observatory-2 (OCO-2), can potentially provide unprecedented global information about CO.sub.2 spatiotemporal variability. However, for use in inverse modeling, data need to be extremely stable, highly precise, and unbiased to distinguish abundance changes emanating from surface fluxes from those associated with variability in weather. Systematic errors in XCO2 have been identified and, while bias correction algorithms are applied globally, inconsistencies persist at regional and smaller scales that may complicate or confound flux estimation. To evaluate XCO2 retrievals and assess potential biases, we compare OCO-2 v10 retrievals with in situ data-constrained XCO2 simulations over North America estimated using surface fluxes and boundary conditions optimized with observations that are rigorously calibrated relative to the World Meteorological Organization X2007 CO.sub.2 scale. Systematic errors in simulated atmospheric transport are independently evaluated using unassimilated aircraft and AirCore profiles. We find that the global OCO-2 v10 bias correction shifts the distribution of retrievals closer to the simulated XCO2, as intended. Comparisons between bias-corrected and simulated XCO2 reveal differences that vary seasonally. Importantly, the difference between simulations and retrievals is of the same magnitude as the imprint of recent surface flux in the total column. This work demonstrates that systematic errors in OCO-2 v10 retrievals of XCO2 over land can be large enough to confound reliable surface flux estimation and that further improvements in retrieval and bias correction techniques are essential. Finally, we show that independent observations, especially vertical profile data, such as those from the National Oceanic and Atmospheric Administration aircraft and AirCore programs are critical for evaluating errors in both satellite retrievals and carbon cycle models.