Join Vonage and help us innovate cloud communications for businesses worldwide!
Why this role matters:
As a Staff Platform Engineer, you are the "engineers' engineer" and a primary architect of Vonageβs engineering culture. You operate with complete autonomy, designing the ecosystem that defines how hundreds of developers interact with the cloud. You will lead the long-term technical roadmap for our Cloud Native Kubernetes platform, balancing massive-scale infrastructure management with the evolution of our Internal Developer Portal. Your mission is to eliminate systemic friction through software engineering and AI-driven operations, ensuring our global production APIs are resilient, cost-efficient, and secure by design.
Your key responsibilities:
β’ Autonomous Technical Leadership: Operate with full independence to scope, size, and execute large-scale initiatives. You are the ultimate technical authority and escalation point, expected to deliver results without requiring assistance from others.
β’ Predictive Problem Solving: Use deep systems knowledge to foresee architectural bottlenecks and operational issues before they impact production. You proactively design solutions for future scale, ensuring the platform stays ahead of business demands.
β’ Platform Strategy & IDP Ownership: Lead the organizational roadmap for the platform while working closely with the Architecture Team to ensure alignment with global standards. Act as the strategic owner for the Internal Developer Portal (IDP) - managing stakeholders, UX, and onboarding to provide a "single pane of glass" for costs, reporting, and self-service.
β’ Mentorship & Multiplier: Act as the primary mentor for junior and senior engineers. You provide the support and guidance needed to raise the engineering bar, fostering a culture of self-sufficiency and technical excellence.
β’ GitHub Actions Expertise & CI/CD Modernization: Serve as the subject matter expert for GitHub Actions, designing enterprise-grade reusable workflows and managing high-performance runner infrastructure. Lead the roadmap to migrate away from legacy Jenkins toward modern, automated CI/CD and "no-code" automation.
β’ IaC Culture & Cloud-Native Infrastructure: Drive the evolution of our IaC culture using tools like Terraform and Crossplane, moving the organization toward a Kubernetes-native management style.
β’ Reliability & Deep Systems Debugging: Respond effectively to service failures by diving into Unix/Linux OS internals (filesystems, system calls, networking). Lead the team in blame-free Root Cause Analysis.
What you'll bring
β’ Experience: 12+ years of progressive experience in software engineering, systems design, and cloud architecture.
β’ Self-Direction: A proven track record of delivering complex, multi-region infrastructure projects from concept to completion without oversight or assistance.
β’ Linux & Systems Mastery: Expert-level knowledge of Unix/Linux internals and networking. You troubleshoot "black box" failures at the system level and perform deep complexity analysis.
β’ Kubernetes Excellence: Expert-level experience managing massive, high-traffic, multi-tenant Kubernetes environments (EKS/GKE).
β’ CI/CD Transformation: Expertise in GitHub Actions (custom actions, API integration, and self-hosted runners) coupled with deep experience in Jenkins architecture and migration strategies.
β’ Coding & Automation: Professional-grade proficiency in Go (preferred) or Python. You approach infrastructure with a software developerβs mindset - focusing on clean code and integrated bot solutions.
β’ Artifact & Security Governance: Ownership of Harbor/Artifactory lifecycles, ensuring cost-governance and security compliance.
β’ Observability & AI: Experience designing observability infrastructure (VictoriaMetrics, Thanos, Prometheus, Grafana) and applying AIOps to automate incident response, predictive capacity planning, and automated FinOps resource optimization.
What's required for application
β’ Systems Thinking: The ability to troubleshoot complex, distributed system failures across networking, storage, and application layers.
β’ FinOps & Efficiency: A proven track record of optimizing large-scale cluster topologies to balance top-tier performance with aggressive cost-efficiency goals.
β’ Observability Architecture: Experience designing observability stacks using open-source tools like Prometheus, Grafana, VictoriaMetrics, or Thanos.
β’ Cloud Native Fluency: Deep familiarity with GCP (Cloud Run Functions, GKE) and AWS (EKS) in a multi-cloud or hybrid-cloud context.
How youβll benefit:
β’ Attractive Discretionary Time Off
β’ Private Medical Insurance with optional dependent coverage
β’ Educational Assistance Reimbursement Program
β’ Opportunities for reimbursement for conferences, trainings, and other personal development events
β’ Maternity and Paternity Leave
β’ Ask recruiter for country specific information
Note: The purpose of this profile is to provide a general summary of essential responsibilities for the position and is not meant as an exhaustive list. Assignments may differ for individuals within the same role based on business conditions, departmental need or geographic location.
Thereβs no perfect candidate. You don't need all the preferred qualifications to make a valuable impact on our team. Our employees and customers come from diverse backgrounds, so if you're passionate about what you could achieve at Vonage, we'd love to hear from you.
To learn how we process your personal data during the recruitment process, please refer to our Privacy Notice.
Who we are:
Vonage is a global cloud communications leader. And your talent will further help brands - such as Airbnb, Viber, WhatsApp, and Snapchat - accelerate their digital transformation through our fully programmable-based unified communications, contact center solutions, and communications APIs. Ready to innovate? Then join us today.
Note: The purpose of this profile is to provide a general summary of essential responsibilities for the position and is not meant as an exhaustive list. Assignments may differ for individuals within the same role based on business conditions, departmental need or geographic location.