Accelerating Geo Joins with H3 Hexagonal Indexes

Geospatial queries are notoriously slow. Joining millions of GPS points against polygon boundaries can bring even powerful databases to their knees. Enter H3, Uber's open-source hierarchical hexagonal grid system that transforms how we handle spatial data.
What Is H3?
H3 divides the entire Earth's surface into hexagonal cells at multiple resolutions. Each cell has a unique 64-bit identifier that encodes its location and size. The magic? Converting geographic coordinates to H3 indexes is blazing fast, and comparing H3 indexes is just integer comparison.
Key benefits:
- Constant-time lookups: No expensive polygon intersection calculations
- Hierarchical structure: 16 resolution levels from continental to sub-meter
- Uniform cell areas: Unlike square grids, hexagons have consistent neighbor distances
- Compact storage: 64-bit integers vs. complex geometry objects
Why Hexagons?
Hexagons offer unique advantages over square grids:
- Equal distance to neighbors: All six neighbors are equidistant from the center
- Better circle approximation: Hexagons approximate circular regions more accurately
- No diagonal ambiguity: No corner-touching neighbors to worry about
- Reduced sampling bias: Less edge effects in spatial analysis
Installing H3
Python
pip install h3
PostgreSQL (with h3-pg extension)
# Ubuntu/Debian
sudo apt install postgresql-15-h3
# Or from source
git clone https://github.com/zachasme/h3-pg.git
cd h3-pg && make && sudo make install
Then enable the extension:
CREATE EXTENSION h3;
Basic H3 Operations
Converting Coordinates to H3
import h3
# Convert lat/lng to H3 index at resolution 9
lat, lng = 37.7749, -122.4194 # San Francisco
h3_index = h3.latlng_to_cell(lat, lng, 9)
print(h3_index) # '8928308280fffff'
Resolution Guide
| Resolution | Average Edge Length | Use Case |
|---|---|---|
| 0 | 1,107 km | Continental analysis |
| 4 | 22.6 km | Regional patterns |
| 7 | 1.4 km | City districts |
| 9 | 174 m | Blocks, POIs |
| 12 | 9.4 m | Buildings |
| 15 | 0.5 m | Precise positioning |
Getting Cell Boundaries
# Get the hexagon boundary as lat/lng pairs
boundary = h3.cell_to_boundary(h3_index)
print(boundary)
Finding Neighbors
# Get all cells within k rings
neighbors = h3.grid_disk(h3_index, k=1) # 7 cells (center + 6 neighbors)
Speeding Up Geo Joins
The traditional approach to geo joins involves expensive point-in-polygon tests:
-- Slow: O(n*m) polygon intersection
SELECT p.id, z.name
FROM points p
JOIN zones z ON ST_Within(p.geom, z.geom);
With H3, you precompute indexes and use simple equality joins:
-- Fast: O(n+m) hash join
SELECT p.id, z.name
FROM points_h3 p
JOIN zones_h3 z ON p.h3_index = z.h3_index;
Implementation in PostgreSQL
First, add H3 indexes to your tables:
-- Add H3 index column to points
ALTER TABLE points ADD COLUMN h3_index h3index;
UPDATE points SET h3_index = h3_latlng_to_cell(
ST_Y(geom)::float8,
ST_X(geom)::float8,
9
);
CREATE INDEX idx_points_h3 ON points(h3_index);
-- Pre-compute H3 cells for polygons
CREATE TABLE zones_h3 AS
SELECT z.id, z.name, h3_polygon_to_cells(z.geom, 9) AS h3_index
FROM zones z;
CREATE INDEX idx_zones_h3 ON zones_h3(h3_index);
Python Data Pipeline Example
import h3
import pandas as pd
def add_h3_index(df, lat_col, lng_col, resolution=9):
"""Add H3 index column to DataFrame."""
df['h3_index'] = df.apply(
lambda row: h3.latlng_to_cell(row[lat_col], row[lng_col], resolution),
axis=1
)
return df
# Process your data
points_df = add_h3_index(points_df, 'latitude', 'longitude', 9)
zones_df = add_h3_index(zones_df, 'center_lat', 'center_lng', 9)
# Fast join on H3 index
result = points_df.merge(zones_df, on='h3_index', how='inner')
Real-World Performance Gains
Benchmarks from production systems show impressive improvements:
| Dataset Size | Traditional Join | H3 Join | Speedup |
|---|---|---|---|
| 1M points | 45 seconds | 0.8 seconds | 56x |
| 10M points | 12 minutes | 3.2 seconds | 225x |
| 100M points | 2+ hours | 28 seconds | 250x+ |
Handling Edge Cases
Points Near Cell Boundaries
For maximum accuracy, you can check neighboring cells:
def find_zone_with_buffer(lat, lng, zones_h3, resolution=9):
"""Find zone including boundary buffer."""
center = h3.latlng_to_cell(lat, lng, resolution)
candidates = h3.grid_disk(center, 1) # Check neighbors too
for cell in candidates:
if cell in zones_h3:
return zones_h3[cell]
return None
Multi-Resolution Indexing
For varying precision requirements:
def create_multi_res_index(lat, lng, resolutions=[7, 9, 12]):
"""Create indexes at multiple resolutions."""
return {
f'h3_res_{r}': h3.latlng_to_cell(lat, lng, r)
for r in resolutions
}
Integration with Common Tools
Apache Spark
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
import h3
@udf(returnType=StringType())
def to_h3(lat, lng, res):
return h3.latlng_to_cell(lat, lng, res)
df = df.withColumn('h3_index', to_h3('latitude', 'longitude', lit(9)))
ClickHouse (Native Support)
SELECT geoToH3(lon, lat, 9) AS h3_index, count()
FROM events
GROUP BY h3_index
ORDER BY count() DESC
LIMIT 10;
DuckDB
INSTALL h3 FROM community;
LOAD h3;
SELECT h3_latlng_to_cell(lat, lon, 9) as h3_index
FROM points;
Best Practices
- Choose resolution wisely: Higher resolution = more cells = more storage
- Pre-compute polygon coverage: Generate H3 cells for static boundaries offline
- Index your H3 columns: Standard B-tree indexes work great on 64-bit integers
- Consider multi-resolution: Store coarse resolution for filtering, fine for precision
- Monitor cell counts: Large polygons at high resolution can explode storage
Conclusion
H3 transforms geospatial operations from computational nightmares into simple integer comparisons. For SRE teams managing location-aware services, adopting H3 can mean the difference between queries that time out and queries that complete in milliseconds.
The hexagonal grid system is battle-tested at Uber scale and integrates seamlessly with PostgreSQL, Python, Spark, and modern analytics tools.
Ready to level up your infrastructure? Akmatori helps SRE teams automate incident response and maintain reliable systems. Check out Akmatori to see how AI-powered operations can transform your workflow.
