2024 - Observability - Principal Engineer - permanent

  • Dublin
  • Huawei Ireland
Are you an individual contributor or technical leader with deep expertise in creating novel observability solutions for large scale distributed systems? We are looking for people who are motivated by the opportunity to do cutting edge observability research and development in a hyperscaler public cloud environment. Some of the technical challenges we expect you to tackle include: Extracting meaningful Customer Centric  SLIs (Service Level Indicators) from very high cardinality data, Defining Critical User Journeys for Cloud APIs and monitoring them across a vast graph of interconnected services, Analyzing and optimizing the end-to-end technical architecture of an observability solution from the storage layer to the data presentation layer. The Cloud Reliability Lab at the Huawei Ireland Research Center has a mission to bring world class reliability to Huawei Cloud by solving cross-functional problems that span hardware, software, networking, monitoring and operations. We have teams working in all these areas with a diverse mix of people including industry veterans, academic researchers, and Ph.D. student interns. In your role, you will collaborate with the local teams in Ireland, research centers across Europe, and engineering teams around the world. Responsibilities Independent execution of technical projects, algorithms or observability solutions required to understand the customer perceived reliability of cloud products. Drive innovation on collection, pre‐processing, storage, and analysis of high cardinality and graph‐based monitoring data. Drive collaboration with cross‐functional teams to embed customer‐centric features in their systems, ensuring enhanced reliability and trust in Cloud services. Establish and oversee Service Level Objectives (SLOs) to set definitive performance benchmarks and expectations for service reliability. Engage with academic partners, industry leaders and open standards bodies in the observability ecosystem to advance the state-of-the-art. Publish key findings in relevant conferences & journals or file patents as appropriate. Requirements Master’s or Bachelor’s in Computer Science, Engineering, or a related field. A minimum of 8+ years of experience in Site Reliability Engineering (SRE) or DevOps in the Cloud / SRE Domain with at least 5+ years of hands-on monitoring & observability expertise. Understanding of full software development life cycle including coding, code reviews, version control system, testing, build pipelines, and operations. Excellent problem‐solving skills and the ability to think creatively when faced with ambiguous problems. Strong communication and collaboration skills, with the ability to work effectively in a team environment. Optional Skills Practical experience using industry-leading monitoring, logging, and tracing solutions. Benefits Competitive salary package Long-term personal growth space Opportunities to work on high profile initiatives that impact the whole company Opportunities to work with the brightest minds in software engineering (including Huawei Fellow and renowned professors in the world) A multi-cultural, international working environment Work for an international world leader, an established yet still rapidly growing Fortune 500 company Check out Life at Huawei Ireland Research Centre: https://www.youtube.com/watch? v=3gR64sYSnOA&feature=youtu.be   DUE TO THE HIGH VOLUME OF REPLIES, ONLY CANDIDATES WHO ARE SHORTLISTED FOR INTERVIEW WILL BE CONTACTED.   Privacy Statement Please read and understand our West European Recruitment Privacy Notice before submitting your personal data to Huawei so that you fully understand how we process and manage your personal data received. http://career.huawei.com/reccampportal/portal/hrd/weu_rec_all.html