Here's the first idea every engineer has: when a rider requests a trip, scan all active drivers, compute the distance to each one, sort by distance, return the top 10. Sounds fine in your head. Let's run the numbers and watch it collapse.
Uber at peak:
Naive linear scan cost:
The only sane approach: use a spatial index — a data structure that can answer "find all points within X km of this coordinate" in O(log n) or better, without touching the bulk of the dataset. Uber uses three overlapping systems: Geohash for the simple case, Redis Geo for the hot path, and H3 hexagons for analytics and supply positioning. Let's build up each one.
Geohash is a clever trick: convert a (lat, lng) coordinate into a short string, where two strings that share a prefix are guaranteed to be geographically close. "ttnfv" and "ttnfw" are next-door neighbors. "ttnfv" and "s0" are on different continents. This lets you do proximity queries with string comparisons.
Start with the whole world. Longitude: -180° → 180°. Latitude: -90° → 90°. Now repeatedly halve each range. If your value is in the right half, encode a 1. Left half → 0. Alternate between longitude bits (even positions) and latitude bits (odd positions).
You end up with a bit string where even positions encode longitude and odd positions encode latitude. For Delhi (77.2090°E, 28.6139°N): the bit string starts 11010011100110011000... The interleaving is key — it creates a Z-order (Morton) space-filling curve that preserves locality.
Group 5 bits at a time and map to this alphabet: 0123456789bcdefghjkmnpqrstuvwxyz. Each character covers 5 bits → 25 = 32 possible values (no 'a', 'i', 'l', 'o' to avoid confusion). Result: Delhi → ttnfvk at precision 6.
Each extra character divides the cell into 32 sub-cells. Longer geohash → tighter area → more precision → larger index storage.
| Precision | Cell Width | Cell Height | Use Case | Delhi Example |
|---|---|---|---|---|
1 | 5,009 km | 4,992 km | Country | t |
2 | 1,252 km | 624 km | Asia region | tt |
3 | 156 km | 156 km | Metro area | ttn |
4 | 39 km | 20 km | City | ttnf |
5 | 4.9 km | 4.9 km | Neighborhood | ttnfv |
6 | 1.2 km | 0.6 km | Street block | ttnfvk |
7 | 152 m | 152 m | Building | ttnfvke |
8 | 38 m | 19 m | Exact spot | ttnfvkes |
Click anywhere on the city grid to drop a pickup point. Watch how the geohash cell is identified, the 8 neighboring cells are highlighted, and drivers in the search zone are found — all before a single distance calculation runs.
Here's the flaw that would sink a naive geohash implementation. A rider and a driver could be 10 meters apart but in completely different geohash cells. Imagine a cell boundary running right down the middle of Outer Ring Road in Delhi. The driver is 9 meters to the left. The rider is 9 meters to the right. Same cell? No. Distance? 18 meters. But if you only search the rider's cell, you completely miss that driver.
Always search the center cell + all 8 neighbors. A geohash cell at precision 6 is ~1.2km × 0.6km. Its 8 neighbors form a 3×3 grid covering ~3.6km × 1.8km. Any point within that outer ring is at most ~2km away from the center — which is your initial search radius. After the 9-cell sweep, you still run distance calculations, but only on the small set of candidates from those cells (maybe 20–50 drivers), not the millions of drivers total.
Geohash at precision 1 shows an ugly discontinuity: cells on the far-left edge of the map (longitude = -180) and cells on the far-right edge (longitude = +180) are adjacent in space but not adjacent in the geohash string. Same for poles. This is a known limitation. In practice, Uber routes around this by using a different spatial index for polar/edge regions (which have essentially zero ridership anyway).
Redis has native geo commands built on top of sorted sets. Internally, Redis encodes each (lat, lng) as a 52-bit geohash stored as the score in a ZSET. This gives you ~0.6mm precision — more than enough. The commands feel like magic but are just sorted set operations under the hood.
Redis GEOADD doesn't store coordinates directly. It calls ZADD internally with a 52-bit integer geohash as the score. GEORADIUS then computes a bounding box from the radius, converts to geohash score ranges, and runs ZRANGEBYSCORE — which is just a binary search on the sorted set. That's why it's O(N+log(M)) where N is results and M is total members. Extremely fast. The distance filtering is done in-process after the set scan returns candidates.
Watch 30 drivers moving around the city in real time. Click anywhere on the map to drop a pickup point, then hit Search Drivers to see the GEORADIUS query execute — with the Redis commands streaming in the terminal.
GEOADD fires every 4 secondsdriver-locations for analytics, ML feature pipelines, and dispatch optimizationIn 2018 Uber open-sourced H3 — a hierarchical hexagonal grid system that covers the entire Earth. Why hexagons? Because in a hex grid, every neighbor is equidistant from the center. In a square grid, diagonal neighbors are √2 ≈ 1.41× farther than edge neighbors. That asymmetry causes systematic bias in any proximity calculation.
| Resolution | Hex Edge Length | Hex Area | Total Cells (Earth) | Uber Use |
|---|---|---|---|---|
r=0 | 1,107 km | 4,250,547 km² | 122 | Continent |
r=4 | 22 km | 1,770 km² | 288,122 | Metro region |
r=7 | 1.2 km | 5.16 km² | 98M | Neighborhood |
r=9 | 174 m | 0.105 km² | 2.4B | Driver pickup zone ✦ |
r=10 | 65.9 m | 0.015 km² | 16.8B | Precise location |
r=12 | 9.4 m | 0.0003 km² | 822B | Sub-building |
Here's how all the pieces connect in production. Three distinct data flows — GPS ingestion, trip request, driver state change — all hitting different services with different consistency requirements.
| Decision | Why | Alternative Considered |
|---|---|---|
| H3 encoded on device (driver app) | Saves 1ms of encoding on servers at 1.25M GPS updates/second | Server-side encoding: simpler but doesn't scale |
| Redis GEORADIUS as hot path (not geohash lookup) | GEORADIUS returns exact distances in one call; geohash requires 9 lookups + manual distance filter | Custom geohash index: ~2× more code, marginal speed gain |
| Kafka between Location Service and analytics | Decouples high-frequency writes from analytics consumers; replay capability for ML retraining | Direct DB write: couples ingest latency to analytics query latency |
| Two separate Redis sets (available vs on_trip) | Trip request only searches available — set is 10× smaller. No wasted compute on busy drivers. |
Status field + filtered scan: requires reading all drivers then filtering |
| Offer to top 3 simultaneously | If top driver doesn't respond in 10s, next offer is already pending — reduces rider wait time | Sequential offer: simpler, but each rejection adds 10s to rider wait |
| Active drivers | ~5 million globally |
| GPS updates/second | ~1.25 million/s |
| Trip requests/minute | ~100,000 |
| GEORADIUS p99 latency | < 5ms |
| End-to-end search SLA | < 50ms |
| Redis sets per city | 2 (available + on_trip) |
| Redis memory per city | ~50MB for 50K drivers |
| Search radius | Dynamic: 0.5–8 km based on supply density |
GEORADIUS call on a 100K-driver ZSET takes < 1msStart with GEOADD + GEORADIUS. Add city sharding when your Redis memory exceeds 80%. Add H3 analytics when you have enough trip history to optimize driver positioning. Don't build the full architecture on day 1.