Nihar Patel's Research Redefines How Marketplaces Measure What Actually Works

The hardest problem in any marketplace is not growth — it is knowing what caused it. Nihar V. Patel, a product data scientist at Etsy, has spent years wrestling with that exact question. His portfolio of six research papers, several already published and the rest nearing completion, tackles the messiest corners of marketplace experimentation: spillovers between buyers and sellers, pricing shocks that ripple across city blocks, driver incentives that work for some and backfire for others, and fairness systems that must hold up not just today but across an entire product lifecycle.

Patel’s path to this work is unusual. He began as a petroleum engineering student, trained to model fluid dynamics and underground reservoir behavior. He then pivoted sharply, earning a Master’s in Engineering Management with a focus on product and data science, winning the Husky Startup Challenge at his university, and eventually landing at Etsy, where he moved through seller onboarding, listing optimization, and emerging agentic commerce initiatives across web and mobile. The analytical instincts stayed the same. The domain changed entirely.

When A/B Tests Break Down

Most product teams default to A/B testing because it is clean. Randomly split your users, apply a treatment to one group, and compare outcomes. Simple. Patel’s research starts from a more uncomfortable premise: in two-sided marketplaces, clean randomization is often impossible, and ignoring that fact produces results that look credible but are not. His paper on propensity frameworks for marketplace selection bias confronts this directly. The paper defines exposure not at the user level but at the edge — the interaction between a shopper and a seller on a bipartite graph. A joint propensity score can be decomposed into two components: shopper intent and seller reach.

The method enforces overlap on both sides of the market simultaneously, then uses doubly robust estimation to guard against misspecification in either component. The practical payoff is that marketplaces can now measure the true causal lift of interventions like preferred seller badges, cross-selling modules, or targeting rules — even when the platform’s own algorithm has been routing traffic and making those measurements deeply unreliable. The paper on the causal effects of pricing interventions extends the same logic to a different setting. Using a combination of short surveys with roughly 300 buyers and sellers, public surge-and-discount information, and time-based proxies such as peaks and holidays, the study models what happens when prices change, not just locally, but in neighboring areas at the same time.

The findings are precise enough to be operationally useful: a 10% price increase corresponds to roughly a 6% drop in orders and a 5 percentage point rise in users switching to a competing app. When neighboring areas also surge simultaneously, local orders fall an additional 2%, and switching climbs another 2 points. Sellers on the other side see a 4% gain in earnings per hour from a 10% pay increase, but utilization slips, suggesting short-run oversupply. These are the kinds of trade-off numbers pricing teams can use when deciding whether to run a promotion or absorb a competitor’s surge.

Fairness As An Operational Problem

The fourth paper in the series reframes algorithmic fairness in a way that most engineering teams have not yet encountered. Fairness is typically treated as a property of a model — you audit it at deployment, maybe add a constraint, and move on. Patel’s life-cycle framework for algorithmic bias argues that this is the wrong unit of analysis entirely. Fairness in a marketplace is a governance problem that stretches across eight stages: measurement, debiased training data, fair representation learning, constrained model training, fairness-aware ranking and exploration, feedback design, evaluation, and board-level governance. Miss any stage and the system drifts.

The practical mechanisms the paper proposes are concrete. Exposure budgets by segment. Group-aware Plackett-Luce ranking. Pacers to prevent abrupt fairness shifts that would disorient sellers overnight. Bayesian shrinkage of ratings to prevent newer sellers from being buried by stale reviews. Gradient-throttling of reputation features to slow runaway rich-get-richer dynamics. The paper also draws a line between meeting regulatory expectations and protecting core metrics — fairness SLOs and incident runbooks that give companies a path forward without sacrificing click-through rates or revenue.

The sixth paper, on adaptive experimentation, follows a similar logic. Standard A/B testing is static: run the experiment, collect results, make a decision, repeat. Adaptive systems — multi-armed bandits, contextual bandits, reinforcement learning — keep learning while the product runs. The paper formalizes how marketplaces can continuously allocate traffic among pricing, recommendation, and promotion options while maintaining causal validity and fairness constraints. It also catalogues the risks honestly: bias against certain user cohorts, confounding in adaptive designs, and overfitting to historical data. The framework offers mitigation strategies for each.

From Ride-Hailing To Etsy’s Sellers

Two of the papers move entirely away from e-commerce, examining ride-hailing platforms as a laboratory for causal inference under interference. The synthetic controls paper adapts the method to cities that differ sharply in infrastructure, culture, and regulation. Each city becomes a unit of observation; counterfactual cities are constructed from donor pools to estimate what would have happened in the absence of a policy change. The method extends to staggered adoption and time-varying policy intensity, making it usable for the kind of messy, multi-city rollouts that platforms actually run.

The paper on heterogeneous treatment effects of driver incentives goes further. Rather than asking whether an incentive works on average, it asks: for which drivers, in which neighborhoods, at what times, and at what level of saturation does the incentive actually create value? The answer requires a full pipeline — cluster and saturation designs, staggered rollouts, encouragement designs for partial compliance, causal forests — all connected to operational decisions about budget constraints, fairness guardrails, and service-quality standards. The gap between “does this work?” and “who should we target, and how much should we spend?” is exactly the gap this work aims to close.

Across all six papers, the unifying argument is that digital marketplaces are dynamic, interconnected systems where standard causal tools fail in predictable ways. Interference travels across user networks. Algorithms self-select which users see which treatments. Feedback loops amplify early signals into lasting inequities. Patel’s research builds the methods that remain usable under those conditions — and connects them, paper by paper, to the operational decisions that marketplace teams make every day. At Etsy, that work has already surfaced over $60 million in business growth opportunities and moved seller and buyer engagement metrics from 8% to 19% across multiple product iterations. The papers, in one sense, explain how numbers like those are produced.

Suggestions

Nihar Patel’s Research Redefines How Marketplaces Measure What Actually Works

When A/B Tests Break Down

Fairness As An Operational Problem

From Ride-Hailing To Etsy’s Sellers

Don't Miss

Forged in the 2008 Aftermath: Sanctuary Metals’ Journey to a Trusted Name in Gold IRAs