In business analytics, calculating the Net Promoter Score (NPS) typically involves manual data annotation from employees. Some may think to use machine learning models to label the data, however this does not have the theoretical guarantees we get from human labeled data. Enter Prediction-Powered Inference (PPI), a new statistical technique that combines human and machine labeled data to create confidence intervals that are data efficient and theoretically guaranteed.
This article explores the intuition behind PPI and emphasizes why you would want to use it. We then jump into a code walkthrough of how to use it for two metrics: NPS and customer recommendations.
PPI is a statistical technique proposed by Angelopoulos et al. [1]. The goal is to enhance confidence intervals by combining human and machine labeled data. Let’s walk through some steps to motivate its usefulness.
In our use case we want to estimate the true NPS score given a set of customer reviews. Typically, an employee will manually read each review and assign a score from 1 to 10, a reliable but time-inefficient method. When dealing with numerous reviews it would be convenient to have a more automatic method.
To address this, we can leverage a machine learning model. A Large Language Model (LLM) is a good candidate to solve this problem because they generalize well to new tasks. The model is prompted to read the review and output a score. This is convenient, but the model comes with errors and imperfections. When making a decision, we need to make sure our data is aligned with human judgement.
Considering the limitations of both approaches, what if we could combine them? We can with Prediction-Powered Inference (PPI)! PPI is a framework that leverages the theoretical guarantees of human-labeled data for confidence intervals and the efficiency of machine-labeled data. With PPI, we aim to benefit from the strengths of both techniques.