Skip to content

Incident Response Engineer-Facility Operations Center

JobgetherRemote (India)June 18, 2026
Remote
Full-time
Incident Response
Senior · 5+ yrs

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for an Incident Response Engineer – Facility Operations Center based in India.

This is a high-impact operations role focused on ensuring the reliability, resilience, and performance of large-scale datacenter environments. The successful candidate will play a key role in coordinating incident response activities, driving operational excellence, and supporting business continuity across critical infrastructure. Working within a fast-paced and highly collaborative environment, you will leverage data, analytics, and reliability engineering principles to minimize service disruptions and optimize operational performance. The role combines incident management, process improvement, risk assessment, and stakeholder coordination, offering significant opportunities to influence strategic initiatives. You will work closely with cross-functional teams, vendors, and technical specialists to strengthen operational readiness and continuously improve reliability standards. This position is ideal for professionals passionate about datacenter operations, infrastructure reliability, and large-scale problem-solving.

Accountabilities:

  • Coordinate incident response, maintenance activities, operational communications, and reporting across a global datacenter portfolio.
  • Develop and maintain reliability programs, operational standards, health scoring frameworks, and performance metrics to improve infrastructure availability.
  • Analyze operational and failure data to identify trends, predict risks, and support proactive reliability initiatives.
  • Collaborate with technical teams to implement predictive maintenance strategies, reliability studies, and process optimization projects.
  • Lead root cause analyses for incidents and outages, ensuring corrective and preventive actions are implemented effectively.
  • Coordinate disaster recovery exercises, compliance activities, audits, and business continuity initiatives.
  • Conduct operational risk assessments and ensure adherence to policies, procedures, and datacenter standards.
  • Own, track, and present key operational and incident-response metrics to stakeholders and leadership teams.
  • Drive automation opportunities and workflow improvements across reporting, monitoring, and operational processes.
  • Partner with cross-functional teams to deliver training, coaching, and operational best practices that improve team effectiveness.

Requirements:

  • Bachelor’s degree in Engineering, Computer Science, Telecommunications, Industrial Engineering, Business, or a related field, or equivalent practical experience.
  • 5+ years of experience in datacenter operations, facilities management, reliability engineering, or environmental health and safety functions.
  • Strong knowledge of reliability methodologies, including lifecycle testing, stress testing, predictive modeling, and failure analysis.
  • Experience with large-scale datacenter infrastructure, including mechanical, electrical, networking, cloud, hybrid, or on-premises environments.
  • Strong analytical, statistical, and reporting skills with the ability to transform complex operational data into actionable insights.
  • Experience using asset management databases, DCIM platforms, or similar infrastructure management tools.
  • Strong communication and presentation skills with the ability to influence stakeholders across multiple functions.
  • Proven ability to manage priorities, drive initiatives independently, and maintain a high level of organization and attention to detail.
  • Advanced proficiency with Microsoft Office Suite, Google Workspace, and reporting tools.
  • Knowledge of business continuity, disaster recovery, risk management, and operational compliance frameworks.

Preferred Qualifications:

  • Experience in reliability engineering for electrical, mechanical, or cooling systems.
  • Certifications such as CDCMP, CMRP, CRL, CRE, or related reliability and maintenance credentials.
  • Familiarity with ISO standards and their practical implementation.
  • Expertise in forecasting, statistical analysis, and operational performance management.
  • Strong technical aptitude with the ability to quickly learn new systems and technologies.

Benefits:

  • Competitive compensation package.
  • Fully remote work opportunity within India.
  • Exposure to cutting-edge datacenter and AI infrastructure environments.
  • Opportunity to work on large-scale, business-critical operations and reliability initiatives.
  • Collaborative and innovative culture with strong cross-functional engagement.
  • Professional development opportunities, including exposure to advanced reliability and operational excellence practices.
  • Involvement in strategic projects focused on automation, process improvement, and business continuity.
  • Opportunity to contribute directly to the performance and resilience of global infrastructure operations.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

 Why Apply Through Jobgether? 

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Job Details

Experience

Senior · 5+ yrs

Apply