Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack β from electrons to tokens β to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.
We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that β with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.
We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved β people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.
If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.
About the Role:
At Crusoe, the Infrastructure Engineers on our Fleet Operations team play a crucial role in ensuring the reliability and stability of our hardware platform. This role involves both hands-on diagnosis and repair of rack-level GPU hardware, as well as developing automation to streamline fleet management, capacity delivery, and maintenance operations.
The ideal candidate will be working closely with Data Center Operations and Engineering teams, and playing a key part in the continuous improvement of our hardware platform's reliability and scalability, ensuring that our cutting-edge infrastructure, featuring the latest NVIDIA and AMD GPUs, continue to operate at peak efficiency for our customers.
What Youβll Be Working On
- Problem Solving and Deep-Level Troubleshooting: Investigating and troubleshooting problems and hardware faults that our automation canβt determine within our GPU platforms. This will involve taking data from system logs, kernel logs, BMC redfish APIs, and if the data is not there, working with hardware and kernel engineers to add information you need to make accurate determinations.
- Coordination and Collaboration: Working closely with our Data Centre Operations, Hardware Engineering and Capacity Planning teams to repair and remediate failed hardware, ensure consistent delivery of new hardware to customers, and roll out new upgrades across the fleet
- Automation and Tool Development: Automate routine processes and build Crusoeβs hardware diagnostics, provisioning and repair tooling
- Build Processes and Documentation: When you figure out the best way to do something, youβll be working on building processes, documentation and tooling to help the next person who finds this problem
- Validate and Test new hardware: Crusoe is often the first company in the world to get the latest generation AI hardware, before itβs fully tested. Conducting rigorous testing and validation on such cutting-edge hardware and servers that comes back from repair
- On-Call: Participate in our on-call rotation, partnering with our US teams to provide follow-the-sun coverage
What Youβll Bring to the Team
- Strong analytical, troubleshooting and problem-solving skills: Our automation takes care of the easy problems, youβll be digging deep to figure out the hard ones
- Linux experience: Youβll have solid unAbout the Rolederstanding of Linux internals and feel at home working in a terminal
- Server Hardware and Provisioning: Exposure to server-class hardware & provisioning
- Fundamentals of Hardware and Networking: You donβt need to be an expert, but you should know if an error message is due to a failed hardware component, a firmware bug, or a networking misconfiguration without escalating
- Excellent communication and collaboration skills: Youβll be working with many different people across a lot of different teams β communication is critical
- Education: Bachelor's Degree in Computer Science, related field, or self-educated in computer science fundamentals.
Bonus Points
- Large-scale GPU operations: We work with cutting edge hardware and software, so we understand most people wonβt have worked with it β but it would be nice if you have!
- Programming Proficiency: Proficiency with at least one programming language (Python, Go, or similar).
Benefits:
Crusoe also offers a competitive benefits package designed to support financial security, health, and overall well-being, including pension contributions, private health and dental insurance, income protection, life assurance, and more.
Compensation:
Compensation will be paid as a salary or hourly. Compensation to be determined by the applicantβs education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.
Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.