07.02.2026

Accelerating Geo Joins with H3 Hexagonal Indexes

head-image

Geospatial queries are notoriously slow. Joining millions of GPS points against polygon boundaries can bring even powerful databases to their knees. Enter H3, Uber's open-source hierarchical hexagonal grid system that transforms how we handle spatial data.

What Is H3?

H3 divides the entire Earth's surface into hexagonal cells at multiple resolutions. Each cell has a unique 64-bit identifier that encodes its location and size. The magic? Converting geographic coordinates to H3 indexes is blazing fast, and comparing H3 indexes is just integer comparison.

Key benefits:

  • Constant-time lookups: No expensive polygon intersection calculations
  • Hierarchical structure: 16 resolution levels from continental to sub-meter
  • Uniform cell areas: Unlike square grids, hexagons have consistent neighbor distances
  • Compact storage: 64-bit integers vs. complex geometry objects

Why Hexagons?

Hexagons offer unique advantages over square grids:

  1. Equal distance to neighbors: All six neighbors are equidistant from the center
  2. Better circle approximation: Hexagons approximate circular regions more accurately
  3. No diagonal ambiguity: No corner-touching neighbors to worry about
  4. Reduced sampling bias: Less edge effects in spatial analysis

Installing H3

Python

pip install h3

PostgreSQL (with h3-pg extension)

# Ubuntu/Debian
sudo apt install postgresql-15-h3

# Or from source
git clone https://github.com/zachasme/h3-pg.git
cd h3-pg && make && sudo make install

Then enable the extension:

CREATE EXTENSION h3;

Basic H3 Operations

Converting Coordinates to H3

import h3

# Convert lat/lng to H3 index at resolution 9
lat, lng = 37.7749, -122.4194  # San Francisco
h3_index = h3.latlng_to_cell(lat, lng, 9)
print(h3_index)  # '8928308280fffff'

Resolution Guide

Resolution Average Edge Length Use Case
0 1,107 km Continental analysis
4 22.6 km Regional patterns
7 1.4 km City districts
9 174 m Blocks, POIs
12 9.4 m Buildings
15 0.5 m Precise positioning

Getting Cell Boundaries

# Get the hexagon boundary as lat/lng pairs
boundary = h3.cell_to_boundary(h3_index)
print(boundary)

Finding Neighbors

# Get all cells within k rings
neighbors = h3.grid_disk(h3_index, k=1)  # 7 cells (center + 6 neighbors)

Speeding Up Geo Joins

The traditional approach to geo joins involves expensive point-in-polygon tests:

-- Slow: O(n*m) polygon intersection
SELECT p.id, z.name
FROM points p
JOIN zones z ON ST_Within(p.geom, z.geom);

With H3, you precompute indexes and use simple equality joins:

-- Fast: O(n+m) hash join
SELECT p.id, z.name
FROM points_h3 p
JOIN zones_h3 z ON p.h3_index = z.h3_index;

Implementation in PostgreSQL

First, add H3 indexes to your tables:

-- Add H3 index column to points
ALTER TABLE points ADD COLUMN h3_index h3index;
UPDATE points SET h3_index = h3_latlng_to_cell(
    ST_Y(geom)::float8,
    ST_X(geom)::float8,
    9
);
CREATE INDEX idx_points_h3 ON points(h3_index);

-- Pre-compute H3 cells for polygons
CREATE TABLE zones_h3 AS
SELECT z.id, z.name, h3_polygon_to_cells(z.geom, 9) AS h3_index
FROM zones z;
CREATE INDEX idx_zones_h3 ON zones_h3(h3_index);

Python Data Pipeline Example

import h3
import pandas as pd

def add_h3_index(df, lat_col, lng_col, resolution=9):
    """Add H3 index column to DataFrame."""
    df['h3_index'] = df.apply(
        lambda row: h3.latlng_to_cell(row[lat_col], row[lng_col], resolution),
        axis=1
    )
    return df

# Process your data
points_df = add_h3_index(points_df, 'latitude', 'longitude', 9)
zones_df = add_h3_index(zones_df, 'center_lat', 'center_lng', 9)

# Fast join on H3 index
result = points_df.merge(zones_df, on='h3_index', how='inner')

Real-World Performance Gains

Benchmarks from production systems show impressive improvements:

Dataset Size Traditional Join H3 Join Speedup
1M points 45 seconds 0.8 seconds 56x
10M points 12 minutes 3.2 seconds 225x
100M points 2+ hours 28 seconds 250x+

Handling Edge Cases

Points Near Cell Boundaries

For maximum accuracy, you can check neighboring cells:

def find_zone_with_buffer(lat, lng, zones_h3, resolution=9):
    """Find zone including boundary buffer."""
    center = h3.latlng_to_cell(lat, lng, resolution)
    candidates = h3.grid_disk(center, 1)  # Check neighbors too
    
    for cell in candidates:
        if cell in zones_h3:
            return zones_h3[cell]
    return None

Multi-Resolution Indexing

For varying precision requirements:

def create_multi_res_index(lat, lng, resolutions=[7, 9, 12]):
    """Create indexes at multiple resolutions."""
    return {
        f'h3_res_{r}': h3.latlng_to_cell(lat, lng, r)
        for r in resolutions
    }

Integration with Common Tools

Apache Spark

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
import h3

@udf(returnType=StringType())
def to_h3(lat, lng, res):
    return h3.latlng_to_cell(lat, lng, res)

df = df.withColumn('h3_index', to_h3('latitude', 'longitude', lit(9)))

ClickHouse (Native Support)

SELECT geoToH3(lon, lat, 9) AS h3_index, count()
FROM events
GROUP BY h3_index
ORDER BY count() DESC
LIMIT 10;

DuckDB

INSTALL h3 FROM community;
LOAD h3;

SELECT h3_latlng_to_cell(lat, lon, 9) as h3_index
FROM points;

Best Practices

  1. Choose resolution wisely: Higher resolution = more cells = more storage
  2. Pre-compute polygon coverage: Generate H3 cells for static boundaries offline
  3. Index your H3 columns: Standard B-tree indexes work great on 64-bit integers
  4. Consider multi-resolution: Store coarse resolution for filtering, fine for precision
  5. Monitor cell counts: Large polygons at high resolution can explode storage

Conclusion

H3 transforms geospatial operations from computational nightmares into simple integer comparisons. For SRE teams managing location-aware services, adopting H3 can mean the difference between queries that time out and queries that complete in milliseconds.

The hexagonal grid system is battle-tested at Uber scale and integrates seamlessly with PostgreSQL, Python, Spark, and modern analytics tools.


Ready to level up your infrastructure? Akmatori helps SRE teams automate incident response and maintain reliable systems. Check out Akmatori to see how AI-powered operations can transform your workflow.

Automate incident response and prevent on-call burnout with AI-driven agents!