logo

Monitoring Desk Associate (Linux)

Operations

TypeFull-time
Minimum Experience2-5 years
DepartmentOperations
Desired Start Date Immediate
Location Mumbai, India (On-site)
Minimum QualificationsDegree in Engineering, Computer Science, or a related field

Job Description

Day in the life:

  • Incident Monitoring & Response: You will be responsible for continuously monitoring Neysa’s AI platforms and infrastructure for any performance issues, system alerts, or service disruptions.
  • Incident Management: Follow established procedures for incident identification, classification, and escalation, ensuring minimal disruption to services in line with SLAs.
  • Troubleshooting & Resolution: When issues arise, you will troubleshoot and resolve problems related to system performance, application failures, and network issues using your Linux expertise.
  • Proactive Monitoring: Stay ahead of potential system issues by proactively tracking the operational status of servers, applications, and networks using monitoring tools like Nagios, Prometheus, or Grafana.
  • Documentation & Reporting: Document incidents, actions, and resolutions in incident management systems. Provide regular reports on recurring issues, root causes, and preventive measures.
  • Collaboration: Work with system administrators, engineers, and developers to resolve issues swiftly, ensuring minimal business impact.
  • Root Cause Analysis & Continuous Improvement: Participate in post-incident reviews to analyze root causes and suggest improvements for future incident management processes.
  • System Maintenance: Conduct regular system checks, patch management, and maintenance to ensure systems are secure and optimized.

 

Must have skills:

  • Experience: 1-5 years of experience in operations or service assurance, focusing on incident management and system monitoring in a Linux environment.
  • Linux Expertise: Solid hands-on experience with Linux operating systems (e.g., CentOS, Ubuntu, RHEL), including system administration, troubleshooting, and performance tuning.
  • Incident Management Knowledge: Understanding of ITIL incident management processes and ability to efficiently handle incidents while maintaining clear communication with stakeholders.
  • Strong Troubleshooting Skills: The ability to diagnose and resolve technical issues quickly, including server failures, network issues, and application-related problems.
  • Experience with Monitoring Tools: Proficiency in using monitoring tools like Nagios, Prometheus, Grafana, or similar to track and manage system health.
  • Effective Communication: Excellent verbal and written communication skills to update both technical and non-technical stakeholders on incident status and resolutions.
  • Team Collaboration: Ability to work effectively within a team, bringing a proactive approach to problem-solving and incident resolution.
  • Technical Aptitude: Basic knowledge of cloud platforms (AWS, Azure, Google Cloud) and networking fundamentals.

 

What separates the best from the rest:

  • Containerized Environments: Experience working with containerized environments like Docker or Kubernetes is a plus.
  • Automation & Scripting: Familiarity with automated scripting for incident resolution and process improvement, using languages like Bash or Python.
  • ITIL Certification: Certification in ITIL or similar incident management qualifications would further enhance your credentials and effectiveness in the role.
  • Advanced Troubleshooting: Experience with advanced troubleshooting in cloud-based or highly complex environments would set you apart from other candidates.
  • Proactive System Improvement: A knack for identifying potential system weaknesses and driving improvements before they impact customer experience would make you stand out as an exceptional candidate.

 

What can you expect

  • The best remuneration in the industry! Flexible working hours.
  • The ability to design to support mission-critical business and then defend your choices.
  • Heady (but healthy) discussions on technology and its impact on the ever-changing business landscape.
  • Your work in a multi-vendor, multi-cloud environment.
  • A great working environment.