SRE Operations Engineer with - Kubernetes, APIs, WAF, databases, API Proxy (Gloo, APIGEE), Kafka, and Cloud (AWS/Azure/GCP) - Onsite work - Dallas TX or Overland Park KS Requisition Name: C&DE-CMT-SRE Operations Engineer Start Date: 5/11/2026 Duration: 55 Weeks Services Location: TX/Dallas Description Of Services: SRE Operations Engineer The L1 SRE is the first line of defense in monitoring, triaging, and executing standardized operational tasks for all enterprise applications running on standard patterns and platforms like Kubernetes, APIs, WAF, databases, API Proxy (Gloo, APIGEE), Kafka, and Cloud (AWS/Azure/GCP). They will followrunbooks, leverage automation, and escalate appropriately to minimize downtime. Skills Mandatory Skills (Must-Have) System & Infrastructure Monitoring Expectation: Ability to use monitoring dashboards (e.g., Grafana, Datadog, Splunk, Argos, AIOps) toidentify anomalies, follow alert workflows, and escalate when thresholds are breached. Example: When a Kubernetes pod crash-loop is flagged in Prometheus, L1 should validate it again strunbooks, check pod logs, and escalate if restart attempts fail. Runbook Execution Expectation: Strictly follow documented steps to resolve standard incidents, escalate when stepsdo not apply or fail. Example: Use a provided runbook to restart a failed API proxy service; if error persists beyond documented steps, escalate to L2. Incident Triage & Communication Expectation: Perform first-line triage of alerts, gather logs/metrics, categorize severity, and notify stakeholders in clear, concise language. Example: For a database connection timeout, collect error logs, verify service reachability, and provide a detailed incident note to L2 before escalation. Kubernetes (Cloud or on-prem) operations knowledge Expectation: Ability to check pod status, understand logs, and verify service endpoints using kubectl and monitoring tools. Example: Run kubectl get pods -n to verify if deployments are healthy. Scripting (Python, Bash, PowerShell) Expectation: Able to read and make small edits to scripts to automate repetitive checks. Example: Modify a Bash script to include an additional log path in a health check. Networking & Security Awareness Expectation: Understand troubleshooting (ping, netstat, curl, traceroute) and know when issues may be related to firewall, WAF, or proxy. Example: For an unreachable service, confirm DNS resolution and connectivity before escalating toL2. Qualifications 2–5 years in IT operations, NOC, or SRE/DevOps engineer role. Kubernetes 101, Linux 101, Networking 101 Understanding of cloud-ready applications Understanding of observability tools (Prometheus, Grafana, ELK, Splunk, etc.). Strong troubleshooting mindset, ability to follow structured workflows. Eg: 5 Why?s and Fishbone If interested or know someone, please share profile at [Upgrade to PRO to see contact] #SRE #SREDevopsengineer #Kubernetes #API #AWS #Azure #GCP #Linux

Sujata Poojari.

SRE Operations Engineer

Similar DevOps Projects

Incident Manager / SRE Operations Engineer

GCP DevOps Engineer

C&DE-CMT-SRE Operations Engineer

Staff SRE

Azure DevOps Engineer

Platform Engineer

Translation Jobs

Popular Skills

Jobs by Salary

For Job Seekers

For Employers