By Dave DeFusco
In today鈥檚 fast-moving markets, knowing which customers to focus on is critical for businesses aiming to maximize their returns. Traditional marketing strategies often struggle to keep up with changing customer behaviors, leaving money on the table. To solve this problem, researchers in the Katz School鈥檚 Graduate Department of Computer Science and Engineering are turning to advanced tools like reinforcement learning鈥攁 type of machine learning that mimics decision-making in dynamic, uncertain environments.
In the paper, 鈥淥ptimizing Customer Targeting Using Reinforcement Learning and Neural Networks for Adaptive Marketing Strategies,鈥 Katz School researchers will introduce at the 2025 IEEE International Conference on Consumer Electronics in January an innovative reinforcement learning (RL) framework for customer targeting, offering a fresh approach to understanding and responding to customer spending patterns.
鈥淎t the heart of the framework are two breakthroughs: the mean-stat strategy and a neural network-based approach using the cross-entropy method,鈥 said Dr. David Li, senior author of the study and program director of the M.S. in Data Analytics and Visualization. 鈥淭ogether, these techniques enable businesses to adapt their strategies more effectively and efficiently than ever before.鈥
Reinforcement learning works by simulating interactions between an agent鈥攊n this case, the business鈥攁nd its environment, the customers. The goal is to maximize rewards鈥攊n other words, to find and target the most valuable customers. Each customer鈥檚 spending potential is modeled as a normal distribution, capturing both their average spending and the variability around it.
The RL agent interacts with this environment, learning over time which customers are likely to generate the highest returns. The challenge lies in balancing exploration (trying new strategies to learn more about customers) and exploitation (focusing on customers already identified as high value.) The mean-stat strategy offers a groundbreaking way to strike this balance.
Traditional RL methods like 系-greedy and its variant, decaying 系-greedy, randomly choose between exploring and exploiting based on a fixed probability. While effective, these methods can miss opportunities, particularly when dealing with diverse customer behaviors.
The mean-stat strategy improves on this by using statistical confidence intervals to guide decisions dynamically. Instead of relying on fixed probabilities, the strategy adjusts based on the level of certainty about a customer鈥檚 spending habits. For instance:
- If the model is highly confident about a customer鈥檚 high value, it focuses on them (exploitation).
- If uncertainty remains, it continues to explore other customers to ensure no lucrative opportunities are missed.
鈥淭his data-driven approach ensures better targeting while avoiding wasted effort on low-value customers,鈥 said Zubair Khan, lead author of the study and a student in the M.S. in Artificial Intelligence.
Some customer behaviors are too complex for simple statistical models. That鈥檚 where neural networks come in. These models excel at identifying intricate patterns by analyzing large amounts of data. Using the cross-entropy method, the neural network learns to fine-tune targeting strategies by identifying hidden trends in spending behavior.
For example, if a business notices that high-value customers tend to react positively to limited-time promotions, the neural network incorporates this pattern into its strategy, refining the targeting process further. Or, consider a restaurant that wants to maximize returns from promotions. Using the mean-stat strategy, it can offer initial discounts to a wide range of customers and monitor their spending. Over time, the strategy identifies high-spenders and shifts the focus toward them while continuing to test new customer segments. This ensures promotions reach the right audience without neglecting untapped opportunities. Similarly, in e-commerce, the combination of reinforcement learning and neural networks could help identify the most profitable customers for personalized ads, discounts, or loyalty programs.
鈥淪imulations conducted with this framework showed impressive results,鈥 said Dr. Li. 鈥淏oth the mean-stat strategy and the neural network-based approach outperformed traditional methods like 系-greedy in identifying high-value customers. They achieved higher cumulative rewards and proved more adaptable to changing customer behaviors.鈥
The mean-stat strategy鈥檚 ability to dynamically adjust exploration based on statistical confidence was particularly effective in balancing short-term gains with long-term learning. Meanwhile, the cross-entropy method allowed neural networks to handle complex, real-world scenarios, providing businesses with a refined and scalable solution for customer targeting.
鈥淥ur study sets a strong foundation for the future of dynamic marketing,鈥 said Khan. 鈥淏y combining advanced statistical techniques with machine learning, businesses can stay ahead of the curve in understanding and adapting to customer behaviors.鈥