Blind Spots in Event Logs: How Generative AI Improves Process Prediction

Incompleteness is not an exception in data science, it is the rule. Our data almost always lags behind the signals that truly matter. In process analytics, this gap translates into blind spots: patterns of behavior that are simply not represented in our event logs.

We set out to address exactly this problem. Our approach leverages generative AI to systematically fill these gaps and, in doing so, elevate the quality of predictive process monitoring.

The idea is straightforward, yet powerful. In predictive process monitoring, we build models that learn from historical cases—say, 10,000 invoicing instances—to predict what will happen in the next one. Which activity comes next? How long will it take from issuing an invoice to receiving payment? The challenge, of course, is that the current case belongs to a variant that we have never seen in the learning data.

This is where large language models come into play. We use them not as predictors, but as informed critics of the data itself. Drawing on their extensive “reading” of how invoicing works, LLMs assess the training data and identify patterns and behaviors that would plausibly be expected, yet are missing. These insights are then used to augment the data in a targeted way.

The results are compelling. Across multiple real-world datasets, our method delivers significant improvements over state-of-the-art techniques for next activity prediction in predictive business process monitoring.

Our paper, “Improving next process activity prediction with scarce event log data using data augmentation with large language models,” has now been accepted for publication in Information Systems and is available as an open access in-press version.

Author team: Martin Käppel, Sven Weinzierl, Lars Ackermann-Igl, Stefan Jablonski, Martin Matzner