Do You Really Need All That Data?
Skip to content
The Insightful Leader Live: AI and Advertising … This Time It’s Personal | Register
Do You Really Need All That Data?
Operations Jan 9, 2026

Do You Really Need All That Data?

Not always. An algorithm helps decision-makers figure out precisely which data they need to find an optimal solution.

Yifan Wu

Based on the research of

Omar Bennouna

Amine Bennouna

Saurabh Amin

Asuman Ozdaglar

Summary Data-driven decision-making is used by nearly every business, government, and organization today. But the costs of gathering and analyzing large datasets continue to grow. A new algorithm co-created by Kellogg’s Amine Bennouna identifies the crucial data that decision-makers need to reach optimal solutions, which can guide more-efficient data collection and computation, saving organizations money and time.

We live in an age of big and bigger data. To train the newest large language models, computers mine trillions of words from the internet and books, while other AI applications digest millions of images and videos.

But not every task requires this deep reservoir of information. Many business or government decisions can be made based on smaller amounts of data—provided the right data is available. The big question, then, is which data you need to make the best decision?

A new algorithm co-created by Amine Bennouna, an assistant professor of operations at the Kellogg School, guides decision-makers to this crucial information.

Developed with collaborators Omar Bennouna, Saurabh Amin, and Asuman Ozdaglar of MIT, the team’s algorithmic method identifies the critical data that decision-makers need to ensure they land on the optimal solution given the specific problem at hand, from hiring to supply-chain optimization to large public-works projects. As a result, the algorithm can help decision-makers reach the best solution while minimizing their investments in money and time.

It flips the script on data-driven decision-making, where the answer isn’t found by merely throwing more and more data at a problem, but instead by being smart about which data to gather.

“It’s not about the size [of the data] itself; it’s about what data matters,” Bennouna says. “Instead of scaling and scaling, it’s more strategic to target where to study your system or where to get data.”

Optimizing under uncertainty

You may not realize it, but the mathematical method of linear optimization is ever-present in the modern world. From package shipping to energy grids to portfolio balancing, linear optimization uses algorithms to compute the best—or the lowest-cost—solution from a universe of possibilities, based on the available data.

That said, there are also limitations to how accurately linear optimization can prescribe optimal decisions. The biggest one is uncertainty—that certain inputs can only be estimated as a range, not an exact number. The more estimations there are going into the model, the less exact is the result, which is a problem for some applications.

“Linear optimization is a beautiful discovery. It has allowed us to solve many very important decision-making problems that we were not able to before,” Bennouna says. “But if you just assume this is the perfect model of your problem, and you solve it, you’ll likely be disappointed. Reality is not exactly your model, so things will deviate.”

Decision-makers can reduce uncertainty and get clearer answers by conducting more studies and adding more and better data to their models. But this process can quickly get costly.

“Instead of scaling and scaling, it’s more strategic to target where to study your system or where to get data.”

Amine Bennouna

For example, imagine you are the lead engineer for the construction of a new subway line through a major city. While you know the start and end points of the line, you need to determine the route that best minimizes construction costs. Many cost-determining factors here are highly uncertain and can only be determined after extensive field studies.

In a perfect world, the engineer would conduct one study after another across the city to determine the exact cost of building every possible route. But in the real world, that isn’t financially feasible, and it’s probably wasteful too—some studies will give you more-useful information than others.

“A million data points can be equivalent to two data points depending on how relevant they are for what we’re trying to do with them,” Bennouna says. “We want to reduce the uncertainty that matters most for the decision—determining precisely the data that enables you to find the optimal decision.”

A more-practical solution

Previous mathematical attempts to solve this problem have focused on settings where the decision-maker collects some data, runs their model, then uses the results to decide where to look next.

A classic example is the “secretary problem,” a scenario where an employer interviews applicants one at a time until they find the best candidate. Mathematicians have created algorithms to calculate how many interviews an employer needs to conduct to find that dream hire.

But that sequential process doesn’t work in many real-world scenarios. In the subway example, engineers can’t wait for the results of one study to come back before they launch the next one, or else the preparation would take years.

“Sometimes, these experiments take so long that we cannot wait for one to finish before moving on to the next.  And in a company, an approach with too much adaptation and change makes it even harder to apply,” Bennouna says.

Bennouna and his colleagues’ algorithm takes a different approach. It calculates the minimal sufficient dataset, or the smallest set of data that can be used to reach an optimal decision. That provides the decision-maker with a more selective and manageable set of factors they need to investigate at once to reduce uncertainty.

In the hiring scenario, for example, that might mean first identifying a subset of applicants who should be moved to the interview stage, rather than selecting the next candidate after each interview outcome. In the subway example, it could translate to finding the set of locations where cost studies should be prioritized.

“We’re thinking of data collection in a more practical way in settings where you need to experiment all at once,” Bennouna says. “The key idea is finding data that informs decisions in the best way possible.”

More-efficient data-driven decisions

But there’s another real-world constraint that interferes with optimal decision-making: budgets. Even if governments and companies would like to select the best possible option, sometimes, there’s only enough money for a “good enough” solution.

Bennouna and his collaborators are now working on an extension of their algorithm that takes this reality into account.

“Maybe you just want to know what’s the best you could do with that budget and how that would change your data requirements,” Bennouna says. “We want to be able to quantify the trade-off of optimality of the decision and type and size of data.”

The team is also looking at how their approach could be applied to different types of decision tasks beyond those modeled by linear optimization, such as the process used by online retailers to optimize their inventory across locations.

The researchers’ concept of “data efficiency” could also extend to other problems. For instance, it could potentially help improve the environmental efficiency of energy-intense computation used by large language models by selecting the most-relevant data on which to train models.

“We have these models that take all the data on the internet and extract knowledge, and we’re getting better and better at that. But the more data [there is], the more costly these algorithms are, and we are already kind of nearing the limit of what we can do,” Bennouna says. “So the question will become, really, what specific data do you need and how to be efficient in that sense.”

About the Writer

Rob Mitchum is the editor in chief of Kellogg Insight.

About the Research

Bennouna, Omar, Amine Bennouna, Saurabh Amin, and Asuman Ozdaglar. 2025. “What Data Enables Optimal Decisions? An Exact Characterization for Linear Optimization.” Conference on Neural Information Processing Systems.

More in Business Insights Operations
2211 Campus Drive, Evanston, IL 60208
© Kellogg School of Management, Northwestern
University. All Rights Reserved. Privacy Policy.