Principal/Senior Software Engineer - Performance and Scale Engineering (Waterford Office or Ireland)

  • Dublin
  • Red Hat, Inc.
About the job The Red Hat Performance and Scale Engineering team is looking for an AI Performance Engineer to join us in the PSAP - Performance and Scale for AI Platforms team. As recent advances in AI technologies have taken the world by storm, Red Hat has engineered an enterprise grade platform - OpenShift AI, based on open source AI technologies, to help enterprises leverage the full potential of these transformative AI technologies. As part of this team, you will be responsible for the performance and scalability assessments of OpenShift AI platform - that includes but not limited to notebooks as a service, data science pipelines, model serving stack, feature store, edge AI, and a distributed model training stack. Our goal is to make OpenShift AI the platform of choice for Red Hat’s enterprise customers for leveraging AI technologies. You will help us achieve those goals through targeted improvements in the performance and scalability of the OpenShift AI platform.You will be required to formulate and execute performance test plans. You will investigate cloud infrastructure, on-prem hardware, RHEL, OpenShift, and OpenShift AI performance tuning knobs. In addition, you will triage and potentially fix performance issues, create new benchmarking tests and automation tools as needed, and socialize performance results on a regular basis. This role needs an engineer that thinks creatively, adapts to rapid change, and has the willingness to learn and apply new technologies. You will be joining a vibrant open source culture, and helping promote performance and innovation in this Red Hat engineering team.The border mission of the Performance and Scale team is to establish performance and scale leadership of the Red Hat product and cloud services portfolio. The scope includes component level, system and solution analysis and targeted enhancements. The team collaborates with engineering, product management, product marketing and customer support as well as hardware and software partners. What you will do Execute performance and scalability benchmarks against various components of the OpenShift AI platform to drive improvements and detect regressionsDevelop tools and automation to aid the performance benchmarking workCollaborate with other teams to resolve performance issuesTriage, debug, and solve customer cases related to AI performanceSubmit performance benchmarking results to industry consortiaPublish results, conclusions, recommendations and best practices via internal test reports, presentations, and external blogs to support our partners and customers. Participate in internal and external conferences about your work and resultsProvide technical leadership and guidance to the wider team What you will bring Experience in running performance tests, data capture, data analysis, and visualizationExperience with systems performance engineering and metrics collection tools such as iostat, vmstat, sar, perf, and prometheus. ​Experience with container technologies (podman, Kubernetes, docker)Programming experience in PythonExperience working with the Linux operating system (RHEL, Fedora or CentOS preferred)Experience with AI technologies and frameworks (pytorch, transformers, etc)Excellent written and verbal language skills in EnglishFollowing is considered a plusKnowledge of AI benchmarking suites such as MLperfExperience with software defined storage, networking as it pertains to KubernetesExperience working with hardware accelerators such as Nvidia GPUsExperience working on a MLOps platform#LI-JK2 #LI-remote About Red Hat is the world’s leading provider of enterprise software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40+ countries, our associates have the flexibility to choose the work environment that suits their needs from in-office to fully remote to office-flex. Red Hatters are encouraged to bring their best ideas, no matter their title or tenure. We're a leader in open source because of our open and inclusive environment. We hire creative, passionate people ready to contribute their ideas, help solve complex problems, and make an impact. Opportunities are open. Join us. Diversity, Equity & Inclusion at Red Hat Red Hat’s culture is built on the open source principles of transparency, collaboration, and inclusion, where the best ideas can come from anywhere and anyone. When this is realized, it empowers people from diverse backgrounds, perspectives, and experiences to come together to share ideas, challenge the status quo, and drive innovation. Our aspiration is that everyone experiences this culture with equal opportunity and access, and that all voices are not only heard but also celebrated. We hope you will join our celebration, and we welcome and encourage applicants from all the beautiful dimensions of diversity that compose our global village. Equal Opportunity Policy (EEO) Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.