Lead Data Engineer

  • Limerick
  • Cpl
About the jobThis client is uniquely positioned to define the future of Health Technology – with a profound purpose, to help people live longer and better by predicting and preventing health issues earlier. It’s all about simplifying healthcare. Founded in , they have been committed to transforming the world of healthcare through operational excellence, innovation, and digital transformation. We are uniquely positioned to enable enterprises & organisations across multiple Technology domains including data analytics, digital technologies, automation & AI. Our Claims Processing, Provider Data Operations and Corporate Functions are among the best in the Healthcare Sector.Summary of role: We are seeking a Lead Data Engineer with extensive big data frameworks expertise to join our team. The ideal candidate will have a strong background using Amazon Elastic MapReduce (EMR) big data platform with frameworks like Apache Hadoop, Apache Spark. In this role, you will be responsible for designing, developing, and implementing big data processing & streaming solutions on AWS EMR cloud platform leveraging Apache Spark/PySpark, Airflow, MongoDB. Primary Duties: •Design, develop, and process big datasets primarily using Amazon EMR, Apache Spark/ PySpark and Apache Hadoop for large data frames and NoSQL MongoDB, AWS S3 file storage. •Expertise in distributed data processing techniques and analytics. •Expertise in GCP big-data solutions such as Cloud Dataproc and GCP BigQuery also advantageous. •Knowledge of Amazon Managed Workflows for Apache Airflow (MWAA). •Programming skills with Python (PySpark APIs, Spark Core APIs, Spark MLib), Spark streaming and EMR cluster management techniques. •Knowledge of Cloud storage resources database, data warehouse, data lake and file storage solutions( AWS S3, GCP GCS, Azure Blob and IBM ICOS) •Collaborate with cross functional teams to define business requirements and translate them into data design and architecture specifications. •Analyse source to target systems, data models, and data structures to ensure data quality and consistency across systems. •Work with AWS resources services, including various data orchestration tools such as Spark/Airflow. •Knowledge of ML/OPS in deploying ML models. •Effectively manage version control using Git for large-scale projects, ensuring efficient handling of frequent and ongoing code updates, merges, and branches to maintain code integrity and collaboration among team members. •Perform data profiling, cleansing, and validation to ensure accurate data transformation and integration. •Develop and maintain ETL process documentation, including data flow diagrams and data mapping documents. •Optimise ETL performance through best practices, performance monitoring, and troubleshooting. •Participate in code reviews, peer feedback sessions, and continuous improvement initiatives. •Work closely with data warehouse and database administrators for optimal data storage and retrieval. •Provide technical support and guidance on big data ETL processes and data integration. •Stay current with industry trends and best practices for big data technologies and cloud infrastructures. Requirements: •Bachelor's degree in Computer Science, Information Systems, or a related field. •Minimum of 7 years of experience in data engineering roles with primary focus on big data frameworks such as Apache Spark, Hadoop, AWS EMR platform. •Strong experience with cloud-based data warehouse solutions using Snowflake, AWS Redshift, Google Big Query (GCP), Azure Synapse •Familiarity with various data & development tools - DBT, DBeaver useful. •Proficiency in SQL for data querying and transformation. •Experience with relational databases such as Postgres and SQL Server and NoSQL MongoDB. •Proficiency in Python or other relevant programming languages. •Understanding of data modelling concepts and experience in integrating database objects with applications. •Experience with both AWS and GCP cloud services beneficial. •Familiarity with version control, CI/CD tools, and best practices. •Strong analytical and problem-solving skills. •Excellent communication and collaboration skills, Confluence/JIRA. •Ability to manage multiple tasks in a fast-paced environment. •Familiarity with data privacy and security best practices in cloud environments#LI-EL1