OVERVIEW
As technology organizations scale, so does operational friction. IT support teams become overloaded with repetitive tickets β account lockouts, access requests, provisioning tasks, and standard βask ITβ issues that drain time and attention from higher-value work.
EverOps partners directly with enterprise engineering and IT organizations to solve complex operational challenges from within their environments. We donβt patch symptoms β we eliminate root causes.
We are seeking a Lead Site Reliability Engineer to own and execute a comprehensive IT support automation strategy designed to significantly reduce ticket volume and human intervention.
THE CHALLENGE
This is not a reactive support role.
This is a systems-level engineering role focused on:
- Eliminating tickets before they are created
- Automating resolution paths when tickets do occur
- Building durable automation frameworks across SaaS and internal platforms
- Removing systemic friction across the IT lifecycle
You will operate heavily within the IT support domain, addressing areas such as:
- Account lockouts and access management
- Provisioning and deprovisioning workflows
- Device and asset lifecycle management
- Standard internal IT requests
- SaaS integrations and workflow orchestration
The expectation is leadership-level ownership. You will define the automation roadmap, architect solutions, and drive initiatives from intake through deployment with measurable outcomes.
THE MISSION
As a Lead SRE, your mission is to:
- Reduce human intervention across IT support workflows
- Build automation systems that scale without increasing headcount
- Architect reliable, observable, production-grade automation services
- Establish engineering standards for automation development
- Mentor junior engineers while maintaining direct ownership of delivery
Success is measured in outcomes:
- Reduced ticket creation rates
- Increased fully automated resolution percentages
- Improved user satisfaction while lowering operational burden
This role requires deep technical capability combined with strong execution discipline and cross-functional influence.
WHAT YOUβLL DO
1. ROOT-CAUSE TICKET ELIMINATION
- Analyze ticket trends and identify systemic failure patterns
- Redesign workflows to remove recurring pain points
- Replace reactive fixes with preventative engineering solutions
- Partner with IT and engineering stakeholders to prioritize high-leverage automation opportunities
2. END-TO-END AUTOMATION ARCHITECTURE
- Design and implement automation workflows across multiple SaaS platforms
- Integrate with third-party and internal APIs (e.g., identity providers, collaboration tools, asset systems, ticketing platforms)
- Architect resilient API integrations including:
- Authentication & authorization flows (OAuth2, SAML, token management)
- Rate limiting and retry strategies
- Error handling and observability
- Build self-service systems that allow users to resolve common requests without human escalation
3. CUSTOM SERVICE & TOOLING DEVELOPMENT
When no off-the-shelf solution exists, you will:
- Build lightweight microservices or serverless functions (Python or Go preferred)
- Develop internal middleware, proxies, or orchestration services
- Create background automation jobs (cron-style processes)
- Containerize and deploy services using modern DevOps practices
You will make thoughtful build-vs-buy decisions, balancing speed, maintainability, and long-term scalability.
4. RELIABILITY, OBSERVABILITY & PRODUCTION STANDARDS
Automation must be as reliable as any production system.
You will:
- Implement Infrastructure as Code (Terraform, Pulumi, or similar)
- Maintain CI/CD pipelines for automation services
- Design monitoring, logging, and alerting frameworks
- Define SLIs/SLOs to measure automation reliability
- Ensure automation services are secure, observable, and resilient
This is not scripting β this is platform-grade engineering.
5. LEAD-LEVEL OWNERSHIP & EXECUTION
This role requires operating as a single-threaded owner for major initiatives.
You will:
- Define solution architecture from concept to deployment
- Set timelines and milestones autonomously
- Conduct feasibility validation in development environments
- Communicate proactively with stakeholders
- Re-scope tactically to maintain forward momentum when blocked
- Deliver measurable impact β not just activity
You are expected to think systemically, move with urgency, and drive initiatives to completion without requiring micro-management.
YOU HAVE
EXPERIENCE
- 8+ years in SRE, Platform Engineering, DevOps, or Automation Engineering
- Proven experience designing enterprise-scale automation systems
- Strong exposure to IT support domains (access, provisioning, identity, device lifecycle, SaaS operations)
TECHNICAL STRENGTH
API & Integration Expertise
- Deep experience designing and consuming REST APIs
- Strong understanding of authentication and authorization patterns
- Experience orchestrating workflows across multiple SaaS platforms
Programming & Automation
- Strong proficiency in Python or Go
- Experience building production-ready services
- Advanced scripting for orchestration and automation logic
Cloud & Infrastructure
- Strong familiarity with at least one major cloud provider (AWS, GCP, or Azure)
- Containerization and Kubernetes exposure
- Infrastructure as Code experience
Systems Thinking
- Networking fundamentals
- Identity and access concepts
- Understanding of asset lifecycle management
LEADERSHIP & COMMUNICATION
- Experience leading technical initiatives from idea through deployment
- Ability to mentor junior engineers
- Strong written and verbal communication skills
- Comfortable influencing cross-functional stakeholders
- Data-driven decision-making approach
You think in terms of leverage, scale, and long-term impact.
WHAT SUCCESS LOOKS LIKE
Within 6β12 months, you will have:
- Eliminated entire categories of recurring IT tickets
- Implemented durable automation frameworks across core IT workflows
- Increased automated resolution rates quarter over quarter
- Reduced manual provisioning and access overhead
- Established scalable, observable automation systems that continue to compound value
Your impact will be visible in metrics β not anecdotes.
NICE TO HAVE
- Experience integrating AI/LLM capabilities into workflow automation
- Familiarity with ITSM frameworks
- Background building internal self-service platforms
- Experience presenting technical strategy to senior leadership
- Experience operating in high-scale, compliance-sensitive environments
BENEFITS
- 100% Remote Workplace
- Unlimited Paid Time Off
- Equity β Become a true owner of the company
- 401K with company contribution and sponsored healthcare
- Professional Growth β Access to training and certification programs