About Nscale
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility.
At Nscale, our Engineering team plays a critical role in driving the deployment and then subsequent management of our infrastructure and software platforms..
We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, youβll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, youβll be contributing to building the technology that powers the future.
About the Role (Job Purpose)
The Infrastructure Engineer (Ironic Specialist) sits within the Infrastructure Engineering team. The Infrastructure Engineering team is responsible for the design, implementation, operation, and continuous improvement of the infrastructure stack that underpins all internal and customer-facing services.
This specialist role is focused on OpenStack bare metal provisioning and lifecycle management, with particular emphasis on Ironic and the services, workflows, and integrations required to operate large-scale physical infrastructure reliably. The role is critical to the successful delivery of automated provisioning, hardware onboarding, lifecycle operations, and hardware fault management across our cloud estate. The role also acts as a key link into the upstream OpenStack community, helping ensure that Nscale both benefits from and contributes to the continued development of Ironic and the wider bare metal ecosystem.
This team ensures high levels of availability, scalability, automation, and security for the infrastructure layers they own.
This team acts as a 3rd/4th line escalation point for support organisations, as well as providing subject matter expertise to pre-sales and other groups within the organisation.
What Youβll be Doing (Responsibilities)
Designing, implementing, and operating scalable and resilient bare metal provisioning platforms with a strong focus on OpenStack Ironic.
Owning the lifecycle of physical infrastructure through automated discovery, enrolment, provisioning, cleaning, deprovisioning, and hardware state management.
Managing and improving integrations between Ironic and related OpenStack services such as Nova, Neutron, Glance, Keystone, Placement, and supporting automation tooling.
Building and maintaining robust provisioning workflows for a wide range of hardware profiles, including GPU-enabled and high-performance server platforms.
Driving automation for hardware onboarding, firmware and BIOS configuration, deployment workflows, validation, and recovery using infrastructure-as-code and configuration management tools.
Troubleshooting complex issues across provisioning pipelines, PXE/iPXE, BMC interfaces, out-of-band management, image deployment, network boot, and hardware compatibility.
Acting as a 3rd/4th line escalation point for advanced bare metal and provisioning incidents, carrying out root cause analysis and implementing long-term fixes.
Supporting platform upgrades, lifecycle management, and operational improvements across Ironic and its dependencies.
Collaborating closely with network, compute, data centre, and support teams to ensure efficient and reliable delivery of physical infrastructure services.
Contributing specialist input to infrastructure roadmap planning, capacity expansion, standard builds, and hardware platform qualification.
Supporting pre-sales and solution design efforts by providing expert guidance on bare metal capabilities, operational models, and deployment constraints.
Contributing to upstream OpenStack bare metal communities, particularly Ironic and related projects, through bug reports, code contributions, testing, reviews, and design discussions where appropriate.
Tracking upstream roadmaps, release changes, and community direction to help shape Nscale's bare metal strategy, upgrade planning, and platform standards.
Representing Nscale's operational requirements, hardware use cases, and scaling challenges in upstream discussions to help drive practical improvements for both the business and the wider community.
Ensuring provisioning platforms and operational processes adhere to security, compliance, and operational standards.
Participating in on-call rotations and incident response activities for critical infrastructure services.
About You (Skills / Qualifications Experience)
Strong Linux systems administration and troubleshooting experience.
Deep hands-on experience deploying, operating, upgrading, and troubleshooting large-scale OpenStack environments.
Strong specialist knowledge of OpenStack Ironic and the surrounding provisioning ecosystem.
Strong understanding of bare metal provisioning concepts including PXE/iPXE, DHCP, TFTP/HTTP boot, BMC technologies, RAID configuration, firmware management, disk imaging, and node lifecycle states.
Strong experience with out-of-band management technologies such as Redfish, IPMI, or vendor management interfaces.
Strong experience designing and building automation for physical and virtual infrastructure using tools such as Ansible.
Strong Python and Bash skills.
Experience troubleshooting complex provisioning and hardware integration issues across server, network, and management layers.
Experience operating production infrastructure at scale with a strong focus on reliability, repeatability, and operational safety.
Ability to collaborate across infrastructure, support, and architecture teams to solve complex technical problems.
Experience contributing to or working closely with upstream open-source communities is highly desirable, particularly within OpenStack, Ironic, Metal3, or related infrastructure projects.
Ability to evaluate upstream changes, influence technical direction, and translate community developments into practical outcomes for production bare metal platforms.
Experience with GPU server platforms, hardware qualification, or large-scale bare metal cloud environments would be highly desirable.
Knowledge of Neutron, networking for provisioning, and the integration points between networking and bare metal deployment would be beneficial.
Equal Opportunities Statement
At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.
If thereβs anything we can do to accommodate your specific situation, please let us know.
The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
The range below reflects the base salary for the position. Actual compensation may vary based on job-related factors such as skill set, experience, education, and location. In addition to base salary, this role may be eligible for bonus, equity, and/or commission programs. Nscale may offer a competitive benefits package including medical, dental, vision, flexible paid time off, parental leave, and retirement plan participation.
Salary Range$150,000β$220,000 USDFor information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.