ML Infrastructure Engineer

  • Limerick
  • Analog Devices
ADI’s Central AI team develops next-generation AI technology that transforms our understanding of the physical world. We develop solutions at multiple tech stack layers, from AI-enabled software applications to deeply embedded AI algorithms. Our mission is to build the Intelligent Edge, where AI transforms how we solve challenging problems by combining deep application knowledge, close customer relationships, extraordinary data, advanced circuits, and breakthrough algorithms.We're looking for engineers who bring expertise across the AI space, including ML platform design, cloud hosted AI services, foundational AI models, LLMs, Edge AI, and cutting-edge AI research; our list of breakthrough products and technologies is growing at a rapid pace. Central AI is critical to ADI’s future and presents opportunities to select from a variety of project areas as you and our AI-driven business grow. Finally, we need our team to be versatile, willing to take risks, able lead projects quickly, and be enthusiastic about new technologies and solutions.Location: Limerick, Ireland or Cork, IrelandResponsibilities As a ML Infrastructure Engineer, you help build, deliver, and optimize software systems to enable AI/ML solutions.Design and implement machine learning systems and workflows to support real-time training, testing, and deployment of AI models.Design and implement distributed cloud GPU training approaches for deep learning model training and evaluation.Build end-to-end machine learning pipelines and integrate them into product and business system workflows.Architect and own the build-release continuous integration processes of our deep learning software components that are built, tested, and released on various DL frameworks (Tensorflow, PyTorch, JAX, etc.)Propose, implement, and deploy efficient and scalable DevOps solutions to allow our fast-growing team to release software more frequently while maintaining high-quality and top performance.Automate away recurring tasks (DL algorithm accuracy and performance regression detection, designing and developing new quality control measures, e.g., code analysis) while employing and advancing best practices.Qualifications3+ years of experience in software engineering, including experience with distributed systems real-time streaming.Degree in Computer Science or a related technical field. Strong system level programming skills (Python, shell scripting, etc.) and familiarity with Linux system administration.Hands-on experience with infrastructure engineering, modern DevOps processes, CI/CD, and GitHub.Experience with ML frameworks (Pytorch, Tensorflow, etc.) and model distribution frameworks (TorchServe, etc.).Experience with developing, implementing, and optimizing container orchestration systems, such as Kubernetes.Ability to work with and manage cloud data technologies, such as Kafka, ElasticSearch, Terraform, AirFlow, or Dagster.Excellent debugging and optimization skills.Experience working on software teams and willingness to work in a fast-paced environment.Job Req Type: ExperiencedRequired Travel: Yes, 10% of the timeShift Type: 1st Shift/Days