How Uber Finds Nearby Drivers

Why You Can't Just Loop Through All Drivers

Here's the first idea every engineer has: when a rider requests a trip, scan all active drivers, compute the distance to each one, sort by distance, return the top 10. Sounds fine in your head. Let's run the numbers and watch it collapse.

THE NAIVE APPROACH IN PRODUCTION

Uber at peak:

~5 million drivers online globally
Location updates every 4 seconds
~100,000 trip requests per minute
Target latency: < 100ms per search

Naive linear scan cost:

5M Haversine calculations per request
Each Haversine ≈ 3 trig calls (expensive)
100K req/min × 5M ops = 500 billion ops/min
You need approximately 10,000 servers just for search

The only sane approach: use a spatial index — a data structure that can answer "find all points within X km of this coordinate" in O(log n) or better, without touching the bulk of the dataset. Uber uses three overlapping systems: Geohash for the simple case, Redis Geo for the hot path, and H3 hexagons for analytics and supply positioning. Let's build up each one.

O(N) IS NOT A PLAN SPATIAL INDEX OR BUST

Geohash: Encoding Space as a String

Geohash is a clever trick: convert a (lat, lng) coordinate into a short string, where two strings that share a prefix are guaranteed to be geographically close. "ttnfv" and "ttnfw" are next-door neighbors. "ttnfv" and "s0" are on different continents. This lets you do proximity queries with string comparisons.

The Encoding Algorithm

Binary-split the bounding box

Start with the whole world. Longitude: -180° → 180°. Latitude: -90° → 90°. Now repeatedly halve each range. If your value is in the right half, encode a 1. Left half → 0. Alternate between longitude bits (even positions) and latitude bits (odd positions).

Interleave lon/lat bits

You end up with a bit string where even positions encode longitude and odd positions encode latitude. For Delhi (77.2090°E, 28.6139°N): the bit string starts 11010011100110011000... The interleaving is key — it creates a Z-order (Morton) space-filling curve that preserves locality.

Encode every 5 bits as Base32

Group 5 bits at a time and map to this alphabet: 0123456789bcdefghjkmnpqrstuvwxyz. Each character covers 5 bits → 2⁵ = 32 possible values (no 'a', 'i', 'l', 'o' to avoid confusion). Result: Delhi → ttnfvk at precision 6.

Longer prefix = smaller cell = more precise

Each extra character divides the cell into 32 sub-cells. Longer geohash → tighter area → more precision → larger index storage.

Precision Cheat Sheet

Precision	Cell Width	Cell Height	Use Case	Delhi Example
`1`	5,009 km	4,992 km	Country	`t`
`2`	1,252 km	624 km	Asia region	`tt`
`3`	156 km	156 km	Metro area	`ttn`
`4`	39 km	20 km	City	`ttnf`
`5`	4.9 km	4.9 km	Neighborhood	`ttnfv`
`6`	1.2 km	0.6 km	Street block	`ttnfvk`
`7`	152 m	152 m	Building	`ttnfvke`
`8`	38 m	19 m	Exact spot	`ttnfvkes`

🗂 LIVE GEOHASH ENCODER — Watch the Bits Get Woven

LAT: LNG: PRECISION:

LONGITUDE BITS (even positions):

LATITUDE BITS (odd positions):

INTERLEAVED (lon at even, lat at odd):

GEOHASH RESULT

——

Each character = 5 bits → Base32 alphabet

Interactive: Geohash Grid Visualizer

Click anywhere on the city grid to drop a pickup point. Watch how the geohash cell is identified, the 8 neighboring cells are highlighted, and drivers in the search zone are found — all before a single distance calculation runs.

GEOHASH CELL EXPLORER — precision 6 click grid to place pickup

PICKUP COORDS Click on the map

GEOHASH (prec 6) —

CELLS SEARCHED —

DRIVERS FOUND —

REDIS COMMAND —

The Boundary Problem: Why One Cell Isn't Enough

Here's the flaw that would sink a naive geohash implementation. A rider and a driver could be 10 meters apart but in completely different geohash cells. Imagine a cell boundary running right down the middle of Outer Ring Road in Delhi. The driver is 9 meters to the left. The rider is 9 meters to the right. Same cell? No. Distance? 18 meters. But if you only search the rider's cell, you completely miss that driver.

THE SOLUTION: ALWAYS SEARCH 9 CELLS

Always search the center cell + all 8 neighbors. A geohash cell at precision 6 is ~1.2km × 0.6km. Its 8 neighbors form a 3×3 grid covering ~3.6km × 1.8km. Any point within that outer ring is at most ~2km away from the center — which is your initial search radius. After the 9-cell sweep, you still run distance calculations, but only on the small set of candidates from those cells (maybe 20–50 drivers), not the millions of drivers total.

❌ Search 1 Cell Only

Miss drivers right across the cell boundary
A driver 15m away may be invisible
Results are incomplete and non-deterministic
Worse near city borders (many cells = many boundaries)
Unacceptable for production

✅ Search Center + 8 Neighbors

Guaranteed coverage of the entire radius zone
9 Redis lookups (still sub-millisecond each)
Candidate set stays small (typically 20–80 drivers)
Then run exact Haversine distance filter on candidates
This is what every production system does

ALSO WATCH OUT FOR: THE WRAPPING EDGE CASE

Geohash at precision 1 shows an ugly discontinuity: cells on the far-left edge of the map (longitude = -180) and cells on the far-right edge (longitude = +180) are adjacent in space but not adjacent in the geohash string. Same for poles. This is a known limitation. In practice, Uber routes around this by using a different spatial index for polar/edge regions (which have essentially zero ridership anyway).

Redis Geo Index: The Production Hot Path

Redis has native geo commands built on top of sorted sets. Internally, Redis encodes each (lat, lng) as a 52-bit geohash stored as the score in a ZSET. This gives you ~0.6mm precision — more than enough. The commands feel like magic but are just sorted set operations under the hood.

The Three Commands You Actually Need

⚡ REDIS GEO LIFECYCLE — 3 Phases Animated

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ PHASE 3: Driver goes offline / starts trip ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ # Remove from available set when trip starts ZREM drivers:available:DEL "d_42" # Add to in-trip set (for tracking, not proximity search) GEOADD drivers:on_trip:DEL \ 77.2090 28.6139 "d_42"

Handling Horizontal Scaling

H3: Uber's Hexagonal Grid (What They Actually Built)

In 2018 Uber open-sourced H3 — a hierarchical hexagonal grid system that covers the entire Earth. Why hexagons? Because in a hex grid, every neighbor is equidistant from the center. In a square grid, diagonal neighbors are √2 ≈ 1.41× farther than edge neighbors. That asymmetry causes systematic bias in any proximity calculation.

Square Grid Problems

Edge neighbors: distance = 1 unit
Corner neighbors: distance = 1.41 units
→ Searching a circle by grid cells is unequal — corner cells cover less of the circle
→ Drivers in corner cells are systematically deprioritized vs edge-cell drivers at same true distance

Hexagonal Grid Advantages

All 6 neighbors: exactly equidistant
→ No systematic bias in proximity search
→ Better approximation of a circle (max error 13% vs 41% for squares)
→ Natural for k-ring expansion (ring of k steps = concentric hexagons)
→ Cleaner for supply/demand density maps

H3 Resolution Table

Resolution	Hex Edge Length	Hex Area	Total Cells (Earth)	Uber Use
`r=0`	1,107 km	4,250,547 km²	122	Continent
`r=4`	22 km	1,770 km²	288,122	Metro region
`r=7`	1.2 km	5.16 km²	98M	Neighborhood
`r=9`	174 m	0.105 km²	2.4B	Driver pickup zone ✦
`r=10`	65.9 m	0.015 km²	16.8B	Precise location
`r=12`	9.4 m	0.0003 km²	822B	Sub-building

⬡ INTERACTIVE H3 K-RING EXPANDER

RING STATS
Ring 0: 1 hex
Ring 1: 6 hexes
Ring 2: 12 hexes
Ring 3: 18 hexes
Formula: k×6 per ring
Total at k: 1+6k(k+1)/2

CURRENT SELECTION
k-ring: 0
Hexes: 1
Drivers found: —
Radius: ~175m

H3 vs GEORADIUS

          GEORADIUS → hot path

          (one call, exact dist)

          H3 → analytics path

          (density maps, surge

          pricing, reposition)

The Full Location Pipeline Architecture

Here's how all the pieces connect in production. Three distinct data flows — GPS ingestion, trip request, driver state change — all hitting different services with different consistency requirements.

GPS INGESTION PATH — Every 4 seconds per driver

📱 Driver App

GPS ping
H3 encoded on device

──▶

⚖️ Load Balancer

HTTPS POST
/location

──▶

🔧 Location Service

Go · 200 instances
validates + routes

──▶

⚡ Redis Cluster

GEOADD
drivers:avail:{city}

📨 Kafka

"driver-locations"
ML · H3 · analytics

TRIP REQUEST PATH — <50ms SLA

🙋 Rider App

POST /request-trip
pickup + dest coords

──▶

📡 Demand Svc

GEORADIUS 2km
top 10 candidates

──▶

⚡ Redis Cluster

returns
10 driver_ids + dist

──▶

🚦 Dispatch Svc

parallel lookup:
ETA · rating · type

──▶

✅ Offer Top 3

simultaneously
first accept wins

Key Design Decisions Explained

Decision	Why	Alternative Considered
H3 encoded on device (driver app)	Saves 1ms of encoding on servers at 1.25M GPS updates/second	Server-side encoding: simpler but doesn't scale
Redis GEORADIUS as hot path (not geohash lookup)	GEORADIUS returns exact distances in one call; geohash requires 9 lookups + manual distance filter	Custom geohash index: ~2× more code, marginal speed gain
Kafka between Location Service and analytics	Decouples high-frequency writes from analytics consumers; replay capability for ML retraining	Direct DB write: couples ingest latency to analytics query latency
Two separate Redis sets (available vs on_trip)	Trip request only searches `available` — set is 10× smaller. No wasted compute on busy drivers.	Status field + filtered scan: requires reading all drivers then filtering
Offer to top 3 simultaneously	If top driver doesn't respond in 10s, next offer is already pending — reduces rider wait time	Sequential offer: simpler, but each rejection adds 10s to rider wait

Active drivers	~5 million globally
GPS updates/second	~1.25 million/s
Trip requests/minute	~100,000
GEORADIUS p99 latency	< 5ms
End-to-end search SLA	< 50ms
Redis sets per city	2 (available + on_trip)
Redis memory per city	~50MB for 50K drivers
Search radius	Dynamic: 0.5–8 km based on supply density