Incident Manager / SRE Operations Engineer
Job Title
#Incident Manager / #SRE #Operations Engineer
Location - Bishopβs Gate, NJ / CTC / West Chester, PA (Onsite/Hybrid)
Experience - 2β3 Years
Employment Type-Contract
Visa-#AnyVisa is acceptable
Job Summary
We are looking for a highly motivated Incident Manager / SRE Operations Engineer to lead and coordinate enterprise-wide high-severity incidents, problem investigations, and operational reliability initiatives. The ideal candidate will have strong experience in Incident Management, SRE practices, Operations Engineering, Observability, and Automation within complex distributed environments.
This role requires excellent communication skills, technical troubleshooting expertise, and the ability to drive cross-functional collaboration during critical production incidents.
Required Technical Skills
* Incident Management
* Site Reliability Engineering (SRE)
* Operations Engineering
* Reliability Architecture
* Automation & Observability
* Executive/Stakeholder Communication
* ELK Stack
* Grafana
* AppDynamics
* COP Monitoring Tools
* Monitoring & Alerting Systems
* Log Analysis & Troubleshooting
Roles & Responsibilities
* Lead and manage enterprise-wide high-severity production incidents.
* Coordinate cross-functional teams during outages, service disruptions, and critical operational events.
* Drive incident resolution, root cause analysis, and problem management activities.
* Create and maintain automation scripts in tools such as ELK, Grafana, AppDynamics, and COP.
* Configure and execute predefined monitoring queries for real-time issue detection and response.
* Attach live query outputs including logs, traces, and metrics directly to incident workflows.
* Improve operational efficiency by reducing manual navigation across monitoring platforms.
* Enhance alerting systems with contextual intelligence, anomaly detection, and metric deviation analysis.
* Identify impacted Configuration Items (CIs), downstream dependencies, and business impacts during incidents.
* Collaborate with engineering, infrastructure, and support teams to improve system reliability and operational standards.
* Prepare executive-level incident communications and status reports.
Preferred Qualifications
* Experience in large-scale distributed production environments.
* Strong understanding of observability and monitoring platforms.
* Hands-on experience with automation and scripting.
* Ability to work in a fast-paced operational support environment.
* Excellent analytical and troubleshooting skills.
* Strong verbal and written communication skills
[Upgrade to PRO to see contact]
#opt #W2 #Contract #SRE #SREoperations #incidentmanager #incident #Anyvisa #opt #cpt #USA #W2position