HILBERT IS BUILDING A REASONING ENGINE THAT MUST NAVIGATE NON-DETERMINISTIC USER BEHAVIOR ACROSS DATA SILOS β TURNING MONTHS-LONG DECISION CYCLES INTO MINUTES. FULLY AGENTIC BY DESIGN, OUR DEMAND INTELLIGENCE PLATFORM DOESN'T JUST CALL APIS; IT SOLVES THE HARD PROBLEM OF ORCHESTRATING MULTI-STEP INFERENCE OVER MESSY, HIGH-STAKES ENTERPRISE DATA WHERE DETERMINISTIC ANSWERS DON'T EXIST.
From Fortune 500 enterprises to beloved brands like FreshDirect, Blank Street, and Levain Bakery, operators run their growth on Hilbert. We're also co-building alongside leading AI companies.
We're looking for an AI Engineer who can build production-grade AI systems end-to-end and serve as the technical AI counterpart for our largest enterprise customers β understanding their workflows, translating their challenges into agentic solutions, and earning their trust through clarity, rigor, and results. All with the ownership and urgency of a startup culture.
This is not a "wire up a prompt chain and move on" role. You'll own core pieces of the AI stack that power Hilbert's demand intelligence platform β designing agent architectures, building evaluation systems, and making hard tradeoffs between accuracy, latency, and cost in production. You'll also be the person our biggest customers look to when they want to understand what the AI is doing, why it made a particular decision, and how it can be shaped to solve their specific problems. If you think in systems, have opinions about how agentic workflows should actually work, can hold your own in a room full of enterprise stakeholders, and want to build AI products that drive real outcomes, we want to meet you.
THE ROLE
You'll work directly with the founding team and across product, data, and GTM to design, build, and improve the AI systems at the heart of Hilbert β with a particular focus on our largest enterprise accounts. You'll be hands-on every day β building agents, designing workflows, shipping to production β but you'll also be the technical AI voice in customer conversations: understanding their business context firsthand, shaping how we apply our agentic systems to their problems, presenting capabilities and results, and building the trust that turns a vendor relationship into a strategic partnership.
The environment is high-autonomy and high-ambiguity β the nature of building AI-native products means requirements shift, approaches evolve, and the person closest to the problem often makes the call. In this role, you're often the person closest to both the technology and the customer.
WHAT YOU'LL DO:
Build
- Design, build, and maintain AI-driven features and pipelines that serve enterprise customers at scale
- Architect and implement agent-based workflows using LangChain, LangGraph, or equivalent orchestration frameworks
- Own systems end-to-end β from experimentation through production deployment and monitoring
- Build and improve evaluation pipelines to measure, validate, and iterate on AI system performance
- Make pragmatic engineering decisions under ambiguity β ship, learn, iterate
- Shape the technical direction of the AI stack as the company scales
Partner with enterprise customers
- Be the technical AI counterpart for our largest accounts β understanding their workflows, data environment, and business challenges firsthand, and translating them into agentic solutions
- Present AI capabilities, results, and roadmap to senior customer stakeholders with clarity, conviction, and appropriate nuance β you're the person they trust to explain what the system does and why
- Translate customer context into engineering decisions β what you learn in customer conversations directly informs how you design agents, workflows, and integrations. You don't build in a vacuum; you build with deep knowledge of how the output will be used
- Hold the line on what AI can and can't do β when customers want a simpler story than reality supports, or push for capabilities that aren't ready, you find a way to be honest and helpful at the same time. You build trust through intellectual integrity, not through overpromising
- Design customer-specific configurations and integrations β enterprise customers have unique platforms, data flows, and operational requirements. You own the technical work of making our agentic systems fit their world, combined with human-in-the-loop elements that keep enterprise trust intact
- Feed enterprise learnings back into the product β patterns you see across customers, gaps in our systems, new workflow opportunities. Your customer exposure makes the whole team smarter
OUR CURRENT HURDLES
These are the kinds of problems you'll walk into on day one:
- Intelligent retrieval across heterogeneous approaches β our agents need the right information at exactly the right moment. The challenge isn't picking one retrieval method; it's combining RAG, graph-based retrieval, and other approaches into a unified strategy that fetches the most relevant content precisely when the agent needs it β no more, no less. In the enterprise context, this means working with customer data environments that vary wildly in structure, quality, and accessibility.
- Agentic workflows that solve real-world problems β it's building workflows robust enough to handle the unexpected. When an agent hits an edge case, missing data, or a situation it wasn't explicitly designed for, it needs to reason through it β leveraging available context, escalating to a human when it can't, and never silently failing. You'll be the person in the room when a customer asks "what happens when it encounters X?" β and the answer needs to be credible.
- Evaluation beyond vibes β we need systematic, reproducible evals that actually predict real-world performance. If you've built custom evaluators for RAG or agent workflows, we want to talk. In enterprise accounts, you'll also need to communicate evaluation results to customers in a way that builds confidence and sets appropriate expectations.
- Execution and real-world integration β an agent that only surfaces insights isn't enough. We're building systems where agents take action β integrating with external platforms, executing workflows, and doing real work with the information they have, combined with human-in-the-loop checkpoints that keep enterprise trust intact. Each enterprise customer has different platforms, different operational flows, and different tolerance for automation β and you'll own making it work.
WHO THRIVES IN THIS ROLE
We care about how you think, how you ship, and how you show up with customers β not how many years are on your resume.
THE PROFILE:
- You're a strong software engineer. Your code is clean, testable, and production-ready.
- You have real experience with LangChain, LangGraph, or equivalent agent/orchestration frameworks. You've built with them, hit their limits, and worked around them β not just followed tutorials.
- You're a trusted technical partner to enterprise stakeholders. You've been in the room with senior audiences and presented technical work in a way that earned trust and drove decisions. You're comfortable with hard questions, pushback, and the ambiguity of enterprise conversations. You don't oversell, you don't hide behind jargon, and you know how to make AI capabilities accessible without dumbing them down.
- You're a product-minded engineer. You understand that a technically impressive agent is useless if it doesn't solve the customer's actual problem. You care as much about the why as the how β and your customer exposure keeps you grounded in what matters.
- You communicate with clarity and conviction. You can explain a technical decision to a non-technical founder, debate architecture tradeoffs with a senior engineer, and walk an enterprise VP through an agentic workflow β all in the same day. Communication is not a nice-to-have here β it's the job.
- You take ownership. You don't wait for tickets. You see what needs to be built, raise your hand, and ship it. If a customer isn't getting value or an integration isn't working, you treat it as your problem.
- You thrive in ambiguity. AI products evolve fast. Customer priorities shift. Requirements change. You're energized by figuring it out β and you bring the customer along on the journey.
- You move at startup speed. You understand what it means to be available, responsive, and biased toward action in a fast-moving, early-stage environment.
STRONG PLUSES:
- Experience in customer-facing technical roles β solutions engineering, applied AI, technical account management, or consulting where you owned the technical relationship
- Experience building eval pipelines β designing metrics, running systematic evaluations, and using results to drive iteration on AI systems
- Backend software engineering experience β building APIs, services, data infrastructure, or production systems
- Exposure to retrieval-augmented generation (RAG), vector databases, or LLM-powered search and recommendation systems
- Deep exposure to retail, e-commerce, or enterprise B2C environments and the business teams that operate in them
- Experience at early-stage startups or high-growth environments where you wore multiple hats
YOU MIGHT BE:
A backend engineer who went deep on LLMs and has always been the person pulled into customer conversations because you can explain what the system actually does. An AI engineer at a platform company who's tired of building for an internal team and wants to see impact face-to-face. A solutions engineer or applied AI engineer who's ready to go deeper on the building side without losing the customer connection. Someone at a larger company who's frustrated by the wall between "the people who build" and "the people who talk to customers" and wants to be both. A startup CTO who wants to go deep on AI at a company where the stack is the product and the customer relationship is the feedback loop. What matters: you ship, you own it, you can hold your own in a room full of enterprise stakeholders, and you communicate like a partner β not a silo.
LOCATION
San Francisco, with occasional travel for team meets, offsites, or customer engagements.
COMPENSATION
Competitive salary + equity package, commensurate with experience. Performance-based bonuses tied to project milestones and customer impact.
THE HIRING JOURNEY
Short form β Intro call β Technical working session β Team conversations β Offer
Fast, human, no bureaucracy.