Data Engineer · Directory Pipeline
Own the daily directory pipeline — pulling certified MWBE, DBE, SDVOB, and LBE directories from dozens of state and federal sources, normalizing the data, snapshotting per project, and keeping the search index honest for every AXI customer.
About the role
Own the daily directory pipeline — pulling certified MWBE, DBE, SDVOB, and LBE directories from dozens of state and federal sources, normalizing the data, snapshotting per project, and keeping the search index honest for every AXI customer.
What you’ll do
- Maintain and extend the daily ingest of certified firm directories (NYS MWBE, OGS SDVOB, NYC SBS, USDOT DBE, NYSDOT DBE, dozens more).
- Design idempotent pipelines: ingest → normalize → diff → snapshot. Every project must be servable the directory state as of the day a search was run.
- Operate the Postgres analytic layer that powers Search Reports — partitioning, indexing, and tuned read replicas under heavy query load.
- Run the messaging layer for re‑ingestion and recompute (Redis, RabbitMQ); design retry and back‑pressure semantics so a flaky upstream never corrupts a customer's file.
- Drive a small set of high‑quality data‑quality tests: certification expiry, NAICS/NIGP/CSI mapping, deduplication of multi‑certified firms.
What we’re looking for
- 6+ years of data‑engineering experience, including Python/Django and Postgres at production scale.
- Solid ELT/ETL design — idempotency, late‑arriving data, replayability.
- Comfort with Redis and RabbitMQ as production messaging primitives, not just notifications.
- AWS native: RDS, S3, ECS/Fargate, IAM, CloudWatch.
Nice to have
- Government / open‑data integration experience (FOIL responses, scraped portals, signed CSV downloads).
- Experience with Mapbox / Deck.GL for geographic indexing and visualization of directory data.
- Background in geocoding, entity resolution, or fuzzy matching at scale.
The tech
AXI is built on a deliberate, modern stack — chosen because it scales, because it’s maintainable, and because everyone on the team can reason about it end to end. You’ll work across all of it.
-
Django · PythonIndustry-leading web framework for scalability, maintainability, and security. The backbone of every AXI service.
-
PostgreSQLThe system of record for every contract, payment, search, and waiver. Multi-million-row tables, careful indexing.
-
Redis · RabbitMQReal-time data processing and messaging — outreach pipelines, directory recompute, asynchronous report generation.
-
Amazon AWSHighest availability, scalability, and robust infrastructure support. RDS, ECS/Fargate, S3, CloudWatch.
-
Mapbox · Deck.GL · Chart.JSCutting-edge data visualization for project mapping, district overlays, and the dashboards executives actually use.
Compensation & benefits
$170–210K + equity + full benefits. Equity for every employee. Full medical, dental, vision, and 401(k) with match. Remote-friendly with quarterly New York City gatherings. Time off that is taken, not accrued.
Ready to apply?
Send a résumé and a few sentences about a project you’re proud of. We read everything that comes in and reply within one (1) business day.