Analyse recent incidents to identify cases, where a proactive detection of patterns, with an end-to-end view, could have prevented an outage, or at least mitigated the impact.
Understand the purpose as well as the composition of the IT service, and identify the critical components or dependencies in detail.
Describe the relevant dashboards, monitors, events and log-files already in place, and where/how they can be accessed.
Together with the IT service owner, define strategies for how to monitor the different aspects of the end-to-end health-state in near-real-time.
Specify in detail, which events are to be aggregated in order to depict a relevant health-state view for a specific IT service.
Specify, what additional aspects should be monitored or logged.Implement a real-time aggregation and correlation of the relevant events.
Implement a near-real-time dashboard with a timely and accurate depiction of the end-to-end health-status of the IT service to be on-boarded.Perform data analysis on the health-status information available from across all on-boarded IT services, and search for patterns or anomalies, which indicate a future degradation of the IT service or a potential intrusion event.
Describe these specific patterns of interest, with procedures on which events to correlate in order to detect operational issues as well as anomalies caused by potentially malicious activity.
Implement real-time triggers for such patterns of interest, where the IT Service Coordination organisation is actively alerted in case for the sake of an immediate first analysis and therefore enabling an early Response.
Document procedures for how to react on such specific scenarios and how to distinguish relevant from false positive events.
Describe the support organisation of the IT service to be on-boarded and formalize the escalation paths, as well as the mutual expectations with the IT service owner.
Train the members of the IT Service Coordination organisation on the IT services to be on-boarded and on the alerts to be reacted on.
Broad technical background with expertise in network technologies, operating systems and typical application stacks (in particular Java and .Net).
Good understanding of cloud delivery models.
Hands-on experience in aggregating data in ELK, SCOM and InfluxDB.
Hands-on experience in developing dashboards with Grafana, SCOM and Kibana.
Experience with statistical data analysis.
Understand complex technology stacks and their dependencies.
Understand business as well as operational requirements and translate them into technical solutions.
Self-motivated and highly proactive attitude.
Work in a global company with people having different cultural backgrounds.
Appear as professional and communicate target group related.
Assume responsibility and drive projects autonomously.
Excellent verbal and oral communication skills (in English).