ABOUT AXIOM
Axiom is building the translational intelligence layer for drug discovery: AI systems that help scientists predict human toxicity earlier, more accurately, and more mechanistically than animal studies or legacy in vitro assays.
Unexpected toxicity is one of the largest reasons drug programs fail. Today, drug discovery teams still rely on fragmented assays, animal studies, and expert judgment to decide which molecules are safe enough to advance. We believe this can be dramatically improved.
At Axiom, we generate and curate massive multimodal datasets spanning chemical structures, primary human cell imaging, multicellular tissue systems, transcriptomics, proteomics, mass spectrometry, ADME, dose-response curves, clinical outcomes, and human exposure. To date, we have built the largest experimental-to-clinical dataset in the world and we are just getting started. We use these datasets to train models and agents that connect chemistry, biology, mechanism, and clinical risk.
We are looking for an infrastructure / platform engineer to build the systems that make this work at scale. You will own the backend, distributed systems, model-serving infrastructure, deployment pipelines, customer data systems, and enterprise platform architecture behind Axiomβs AI products.
This is a role for a deeply technical generalist who wants to help Axiom evolve into a world-class engineering company.
CHARTER
Build the infrastructure that powers the first scientific AI systems capable of replacing animal and legacy toxicity experiments.
You will create the platform that turns Axiomβs research into reliable, secure, enterprise-ready software used by the worldβs leading drug discovery teams.
WHAT YOU WILL DO
You will own critical systems across Axiomβs backend, ML platform, customer deployment, and enterprise infrastructure.
You will:
- Lead Axiomβs evolution into a world-class engineering organization focused on enterprise ML and data software.
- Design and build the core infrastructure powering Axiomβs ML systems, including model evaluation, model deployment, inference, serving, monitoring, and versioning.
- Architect scalable systems for storing, retrieving, processing, and serving chemical, biological, clinical, customer, and model-generated data.
- Deploy large-scale reasoning agents from research environments into production systems used by customers.
- Build infrastructure for running image models, LLM agents, mechanistic reasoning systems, and multimodal toxicity models at scale.
- Create robust systems for customer data management, including secure ingestion, access control, audit trails, versioned deliveries, and customer-specific workspaces.
- Build the backend systems behind Axiomβs product, including APIs, data services, inference services, workflow systems, and internal tooling.
- Support enterprise customer deployments, including cloud, secure VPC, and potentially on-prem or customer-controlled environments.
- Build evaluation and observability systems for ML models and agents, including regression testing, model comparison, trace inspection, rollout monitoring, and failure analysis.
- Work with ML researchers to turn prototypes into reliable production systems.
- Work with scientists to turn research workflows into durable software.
- Work with product and customer teams to ensure enterprise users can trust, understand, and depend on Axiomβs systems.
- Teach and empower scientists, ML researchers, and engineers to write better software and build better systems.
- Help define Axiomβs engineering culture from the ground up.
WHAT WE ARE LOOKING FOR
We are looking for a strong generalist software engineer with excellent taste in systems, infrastructure, and product.
You might be a great fit if:
- You have built production systems used by large enterprise customers.
- You have designed backend or distributed systems that process large amounts of data reliably.
- You have built SaaS products that store, process, and serve sensitive customer data.
- You have worked on ML infrastructure across data access, training, evaluation, deployment, inference, monitoring, or observability.
- You understand the messy parts of getting ML into production: versioning, reproducibility, evaluation, rollout safety, monitoring, debugging, latency, cost, and reliability.
- You enjoy working with enterprise customers and simplifying complex technical systems around their needs.
- You have built infrastructure for LLM-powered products, research workflows, retrieval systems, agents, or large-scale data processing.
- You want to build distributed infrastructure for long running, compute intensive parallel reasoning workflows.
- You are comfortable moving across cloud infrastructure, backend systems, distributed compute, ML infrastructure, security, DevOps, and product engineering.
- You want to work directly with researchers and scientists, helping them turn frontier research into usable products.
- You care deeply about reliability because you know customers will use these systems to make consequential drug discovery decisions.
- You want ownership over hard, ambiguous systems at an early-stage company.
TECHNICAL SKILLS WE VALUE
We do not expect every candidate to have all of these, but we are especially excited by experience with:
- Python, TypeScript, Go, Rust, or similar systems/backend languages.
- Cloud infrastructure on AWS, GCP, or Azure.
- Kubernetes, Docker, Terraform, Pulumi, CI/CD, and production DevOps.
- Distributed systems, job queues, orchestration, scheduling, and large-scale compute.
- Ray, Modal, Slurm, Anyscale, Spark, Dask, Daft, Airflow, Dagster, Prefect, Argo, or similar tools.
- Backend APIs, data services, databases, object storage, caching, and search/retrieval systems.
- Postgres, DuckDB, Snowflake, BigQuery, ClickHouse, Elasticsearch, OpenSearch, or vector databases.
- ML infrastructure for model serving, inference, training pipelines, evaluation, monitoring, and deployment.
- LLM systems, agents, retrieval-augmented generation, observability, and evaluation harnesses.
- Enterprise software, SaaS platforms, security, access control, audit logs, and customer data isolation.
- Large-scale scientific, healthcare, biotech, chemistry, biology, or clinical data systems.
THE KIND OF PERSON WHO THRIVES HERE
Axiom is not a normal company, and this is not a normal infrastructure role.
We are looking for someone who wants to build the systems underneath a new kind of scientific AI company. The product is complex. The data is massive and messy. The models are evolving quickly. The customers are demanding. The infrastructure has to be reliable enough for real drug discovery decisions.
The people who thrive here:
- Move with urgency.
- Have exceptional engineering taste.
- Take full ownership of the customer experience.
- Care deeply about reliability and all the ways systems can fail.
- Can build fast without creating chaos.
- Are comfortable operating across backend, infrastructure, ML, security, and product.
- Enjoy working with scientists and researchers.
- Can teach others how to become better engineers.
- Are practical, unpretentious, and collaborative.
- Want their work to multiply the output of the entire company.
- Are not satisfied with incremental improvements.
- Want to build a generational company.
- We are looking for someone with a relentless observe-orient-decide-act loop: someone who constantly identifies bottlenecks, builds the right abstractions, and makes everyone around them faster.