WHO WE ARE
Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world. With 32 global cloud data center locations, Vultr is trusted by hundreds of thousands of active customers across 185 countries for its flexible, scalable, global Cloud Compute, Cloud GPU, Bare Metal, and Cloud Storage solutions. In December 2024 Vultr announced an equity financing at a $3.5 billion valuation. Founded by David Aninowsky and self-funded for over a decade, Vultr has grown to become the worldβs largest privately-held cloud infrastructure company.
VULTR CARES
100% company-paid insurance premiums for employee medical, dental and vision plans.
401(k) plan that matches 100% up to 4%, with immediate vesting
Professional Development Reimbursement of $2,500 each year
11 Holidays + Paid Time Off Accrual + Rollover Plan
Commitment matters to Vultr! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
$500 stipend for remote office setup in first year + $400 each following year
Internet reimbursement up to $75 per month
Gym membership reimbursement up to $50 per month
Company paid Wellable subscription
JOIN VULTR
The GPU-focused Technical Account Manager (TAM) leads the post-sales technical success of customers deploying large-scale AI, training, inference, and high-performance GPU workloads on the companyβs platform. This includes customers using NVIDIA GPU clusters, AMD GPU clusters, GPU VMs, and rack-scale bare-metal environments.
You will act as a trusted advisor across LLM training, fine-tuning, RAG workloads, distributed training frameworks, storage throughput requirements, multi-GPU scaling, and performance tuning. This role requires deep technical fluency and exceptional customer management skills to help AI/ML teams achieve predictable, cost-efficient, high-performance outcomes.
Key Responsibilities
AI/GPU Onboarding & Workload Architecture
- Lead onboarding for customers deploying GPU clusters (bare metal, VMs, or hybrid).
- Advise on cluster design: multi-GPU topology, NVLink/NVSwitch considerations, RDMA, Infiniband and RoCE Ethernet, networking throughput, and storage IOPS requirements.
- Guide customers in selecting GPU types and configurations based on workload (training, fine-tuning, inference, embeddings, RAG pipelines).
- Support distributed frameworks: PyTorch, TensorFlow, DeepSpeed, Megatron, JAX, Ray, Mosaic, HuggingFace, etc.
- Advanced hands on Kubernetes skills
- Advanced hands on SLURM skills
Performance Optimization & Scaling
- Identify bottlenecks (network, storage, memory bandwidth).
- Provide tuning recommendations for batch size, mixed precision, parallelization strategies, and checkpointing.
- Help customers evaluate cost vs. performance tradeoffs (GPU mix, CPU pairing, instance types, cluster sizing).
Technical Relationship Ownership
- Own the long-term technical strategy across assigned GPU/AI accounts, including hyperscalers, labs, and high-growth AI startups.
- Host recurring technical review meetings, roadmap reviews, and optimization sessions.
- Define scaling plans, future GPU reservation needs, and capacity forecasting.
- Incident & Escalation Management
- Partner with Support, SRE, Networking, NOC, and Product Management & Engineering to resolve high-urgency incidents.
- Manage outage communications, corrective action plans, and postmortem reviews with customers.
- Advocate for GPU reliability improvements and influence roadmap priorities.
Account Growth & Expansion
- Identify opportunities for expanded clusters, high speed storage, or networking upgrades.
- Support Sales with technical validation and architecture diagrams needed for expansion.
Customer Advocacy & Product Feedback
- Provide structured feedback on existing and future GPU offerings, networking fabrics, storage platforms, and upcoming AI/ML platform features.
- Partner with Product on early access programs (new GPUs, pipelines, orchestration, etc.).
Qualifications
- 2β5+ years as an AI/ML Engineer, AI/ML Ops, Technical Account Manager, HPC Engineer, Sales/Solutions Engineer or relevant technical role.
- Strong knowledge of GPU hardware architectures (NVIDIA/AMD), CUDA/ROCm, distributed training, and ML frameworks.
- Experience with Linux tuning, networking (Infiniband, RoCE fabrics).
- Experience with high-performance storage systems (DDN, NetApp, Vast, Weka, etc.).
- Ability to communicate complex concepts clearly to both executives and engineering teams.
- Prior experience supporting hyperscale, AI labs, or large cluster deployments is a plus.
- Cloud Native Computing Foundation Certified Kubernetes Administrator (CKA) certification is a plus.
Compensation
$115,000 - $140,000
This salary can vary based on location, years of experience, background and skill set.
INCLUSION & PRIVACY
We are an equal opportunity employer and are committed to creating an inclusive environment for all employees. We welcome applications from individuals of all backgrounds and experiences, and we prohibit discrimination based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected status under applicable laws. Vultr will consider qualified applicants with arrest or conviction records in accordance with applicable laws and will not conduct a background check until after an offer of employment has been extended and accepted.
We also take your privacy seriously. We handle personal information responsibly and follow applicable laws, including U.S. privacy rules and Indiaβs Digital Personal Data Protection Act, 2023. Your data is used only for legitimate business purposes and is protected with proper security measures.
Where allowed by law, applicants may request details about the data we collect, access or delete their information, withdraw consent for its use, and opt out of nonessential communications. For more details, please see our Privacy Policy [Upgrade to PRO to see link]