Operations Mar 1, 2023
At Their Best, Self-Learning Algorithms Can Be a “Win-Win-Win”
Lyft is using ”reinforcement learning” to match customers to drivers—leading to higher profits for the company, more work for drivers, and happier customers.
Sébastien Martin was working at Lyft as a postdoctoral fellow when Covid hit. Suddenly, there were massive changes in the number of passengers and drivers using the app, and the company tried to quickly adapt.
Lyft had always used an algorithm to match drivers and passengers, so they figured they could tweak it to make their Covid plan work. But it ended up being much harder than expected. “It showed the limit of the system,” says Martin, who is now an assistant professor of operations at the Kellogg School.
The main issue, Martin explains, is that simple algorithms—such as matching the closest driver to a passenger—actually don’t work that well.
It got Martin thinking about how the matching algorithm could be improved, even after rideshares recovered from the pandemic. What if the algorithm could teach itself how to better allocate drivers and then make those adjustments in real time?
He and a team from Lyft have accomplished just that. It took more than a year—an eternity at a tech firm, Martin says—to create an algorithm that could engage in “reinforcement learning.” And while designing the algorithm was difficult, so was getting buy-in across the company to even attempt this.
After all, with reinforcement learning, “you give away a lot of control,” Martin says. “A machine that can make decisions without telling you? Imagine if it’s making those decisions about work that’s your bread and butter.”
But the results were worth it: The company began making more money, drivers had more work, and passengers gave more five-star reviews. Plus, their project was named one of six finalists last month for the Franz Edelman Award, the most prestigious award in the field of analytics and operations research. If you’ve taken a Lyft in the last year or two, then this algorithm has helped you get matched to a driver, and the data from your trip in turn helped the algorithm improve.
Against the backdrop of growing apprehension about self-learning algorithms (think ChatGPT), the Lyft story shows that some of these tools truly do improve everyone’s lives, Martin says.
“It’s not always a zero-sum game,” of trade-offs between winners and losers, he says. “Passengers are happier. Drivers are busier. The platform is making more money. There is literally no downside.”
Why closest isn’t always best
For most people, especially those of us who have had to stand on a rainy corner waiting for a rideshare, it seems logical that sending the closest driver makes the most sense. But that is not always the case.
The issue is when it’s busy and drivers are in limited supply, Martin explains. When that happens, the closest driver to a passenger might be pretty far away. If you send that driver, they’ll be spending a lot of time “driving empty,” and the passenger is stuck waiting for a long time and may even cancel their ride while the driver is en route. And, crucially, it means that any new passengers that try to hail a ride will need to wait even longer because the available drivers are spending so much of their time trying to get to their next fare, meaning there are fewer and fewer drivers available to shuttle people around.
“It’s like a death spiral for platforms,” Martin says.
The ideal solution, then, would be a matching algorithm that could forecast what the situation will look like over the next few minutes. Will a new, closer passenger appear? Will traffic clear on a certain road making the drive faster? If the driver does pick up someone, will there be another passenger near the destination point making that next transition more efficient?
“The improvement comes from the fact that the drivers are better utilized.”
Essentially, the algorithm would need to be able to predict what will happen next. And that’s what Martin and the team at Lyft were able to teach it to do.
They did this by focusing on the “value” of available drivers at any given time, with that value being the estimate of how much money the driver will earn while they work that day. Then they trained their algorithm to continuously analyze what was happening in real time in order to train itself to anticipate what was most likely to happen next.
It’s similar to reinforcement-learning algorithms that play chess, Martin says. They are trained on millions and millions of actual chess games and are then able to use that knowledge to forecast their opponents’ next move.
The team tested their algorithm by creating experimental hours, where Lyft matched drivers and passengers using the reinforcement-learning algorithm, and control hours, where matching was done by Lyft’s regular algorithm.
After more than a year of refining, they found a new algorithm that bested the old one across all important measures. It generated the equivalent of more than $30 million dollars a year in increased revenue for the company, along with a corresponding increase in drivers’ earnings. Passengers were 3 percent less likely to cancel a ride request, and there were 13 percent fewer ride requests that resulted in having no available driver. At the same time, passengers’ five-star reviews also increased.
“There weren’t more people using Lyft,” Martin says. “The improvement comes from the fact that the drivers are better utilized.”
Beyond the math
Their success is the first documented case of a rideshare company using reinforcement learning. But designing the algorithm was not the only difficult piece.
“More important than the math is how do you do this within the company,” Martin says.
Reinforcement learning means that the humans involved don’t always know what’s going on. That becomes tricky for an organization in a number of ways, Martin says. For example, say the team that works on pricing wants to run its own experiment. They would want all other factors at the time to be kept constant so that they could understand their data. But if a matching algorithm is changing things on its own at the same time, it’s difficult to know how to interpret the data from the pricing experiment.
“It makes a lot of other things much more complicated,” Martin says.
Additionally, it makes it difficult for the team working on the algorithm to understand how to continue to innovate. “If humans lose a sense of what is happening, how can they keep innovating?” Martin asks. He is working with a PhD student, Yudi Huang, who is currently working with Lyft on precisely that question.
Furthermore, at Lyft, the development of this algorithm took more than a year. “A year is a long time for a tech company. Two months is a long time! It’s very rare to spend a year on something that doesn’t work for that long,” he says.
Ultimately, the team kept up its morale and was able to convince the rest of the company to let it keep experimenting. There was no high-tech strategy for this, he says. “It’s the same way you do things anywhere,” he says. “You talk to the right people. You earn the trust of people. You form a team that is excited and then you show proof that it works. It’s common in research to think that the idea itself is enough. But in an organization, it’s the process that leads to something happening.”
The fact that, at least in this case, the process led to a “win–win–win” situation is particularly exciting to Martin.
Each time the team tested a revised algorithm, they would watch a dashboard of important metrics that would turn red if the experiment was worse than the status quo and green if it was better.
The day they landed on their winning algorithm, “the screen was just green,” he says. “That’s really what optimization in operations is all about: finding that fully green thing.”
Emily Stone is senior editor at Kellogg Insight.