Multivariate Modeling and Feature Attribution in Consumer Product Ratings: A Case Study on Nutritional Profiles of Breakfast Cereals
Keywords:
consumer ratings, nutritional profiling, multivariate modelingAbstract
Generally speaking, consumer product ratings play a central role in shaping food choices, yet the relationship between nutritional attributes and perceived product quality remains under-quantified. This paper explores how nutritional components—such as sugar, fiber, protein, and calories—influence consumer ratings through a literature review and empirical case study of breakfast cereals, while examining how interpretable predictive models reveal the most critical drivers. Using a publicly available cereal dataset encompassing approximately 77–80 products (with detailed nutritional content and rating variables), this study integrates descriptive statistics, linear regression, regularized regression, and ensemble tree models. We employ both model-based and model-agnostic feature attribution methods. Our findings reveal that linear models explain nearly all out-of-sample rating variance, indicating a near-deterministic mapping between nutritional components and ratings. While random forest models demonstrate robust performance, their accuracy remains relatively low. Cross-model analysis identifies sugar (negative impact) as the most critical variable, followed by calories and sodium (negative), and dietary fiber (positive). Finally, we translated these insights into health-oriented recommendation strategies, prioritizing products with low sugar, high fiber/protein, and moderate calories. Under these constraints, we generated the top 10 predicted-score cereal products. Our findings support current guidelines limiting free/added sugar intake and increasing dietary fiber, illustrating how explainable modeling informs healthier product recommendations and formulation improvements.
