About Nscale
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility.
We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, youβll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, youβll be contributing to building the technology that powers the future.
About the Role
Weβre hiring an Infrastructure Engineer to design, implement, operate, and continuously improve the infrastructure platforms that support both internal and customer-facing services at Nscale.
This role sits within the Operational Engineering team in Engineering, where youβll work across the infrastructure stack below the hypervisor with a strong focus on OpenStack, storage systems, Proxmox, DNS, DHCP, and infrastructure automation. Youβll collaborate closely with internal teams to ensure infrastructure meets performance, availability, and security requirements, while also serving as a 3rd/4th line escalation point for complex issues.
Your work will directly support the reliability, scalability, automation, and security of the platforms that power Nscaleβs GPU cloud. This is a high-impact role for someone who wants to shape core infrastructure, improve operational excellence, and bring deep technical expertise to both delivery and ongoing evolution of critical systems.
What you'll be doing
Infrastructure Design & Operations
β’ Design scalable and resilient infrastructure platforms across OpenStack, Proxmox, Ceph, and core supporting services.
β’ Implement infrastructure components that underpin internal and customer-facing services.
β’ Operate critical infrastructure layers below the hypervisor with a focus on stability and performance.
β’ Maintain essential services such as DNS, DHCP, and configuration management tooling.
Automation & Continuous Improvement
β’ Improve automation for provisioning, monitoring, patching, and recovery.
β’ Use infrastructure-as-code and configuration management tools to standardise operations.
β’ Drive continuous improvement across infrastructure reliability, scalability, and operational efficiency.
β’ Support repeatable and maintainable platform operations through automation-first approaches.
Incident Management & Escalation
β’ Act as a 3rd/4th line escalation point for complex infrastructure issues.
β’ Partner with support teams to resolve incidents and restore services effectively.
β’ Investigate root causes of infrastructure problems and contribute to long-term fixes.
β’ Participate in on-call rotations and incident response activities for critical infrastructure.
Cross-Functional Collaboration & Technical Guidance
β’ Collaborate with internal teams to ensure solutions meet performance, availability, and security requirements.
β’ Contribute to infrastructure roadmap planning, including capacity management and performance tuning.
β’ Introduce new technologies that strengthen the infrastructure stack over time.
β’ Provide technical expertise to pre-sales and other groups on infrastructure capabilities and best practices.
Standards, Security & Compliance
β’ Ensure infrastructure platforms adhere to compliance, security, and operational standards.
β’ Apply best practices to the operation and evolution of infrastructure services.
β’ Support secure and well-governed platform delivery across the environments you own.
KPIs
β’ Infrastructure availability and resilience
β’ Automation coverage for provisioning, patching, monitoring, and recovery
β’ Complex incident resolution and root cause remediation
β’ Capacity management and performance tuning effectiveness
About You
β’ Strong experience deploying, managing, upgrading, and operating large OpenStack clusters
β’ Strong experience deploying, managing, and automating Proxmox
β’ Strong Python and Bash skills
β’ Strong troubleshooting experience with Linux and services running on Linux
β’ Experience working with Ceph and core infrastructure services
β’ Knowledge of DNS, DHCP, and configuration management in production environments
β’ Ability to operate and improve infrastructure with a focus on availability, scalability, automation, and security
β’ Experience handling complex infrastructure issues in an escalation capacity
β’ Ability to work effectively with internal teams and provide technical input across the organisation
β’ In-depth knowledge of Ironic and Neutron/OVN/OVS is a plus
What we can offer you
At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core.
Highly competitive compensation package (base + bonus + equity), with performance reviews every 12 months. π
Join one of the fastest-growing AI infrastructure companies β your chance to directly shape how global AI capacity is planned and deployed. β¨
Expect a dynamic progression plan tailored to your ambitions. Grow by leading critical cross-functional initiatives and shaping capital strategy β always with our full support.
Human-First Flexibility: We treat you as humans first. π«Άπ½ Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments.
Equal Opportunities Statement
We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.
If thereβs anything we can do to accommodate your specific situation, please let us know.
The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.
For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.