- Build and maintain ETL pipelines to ingest data from a wide variety of public and proprietary sources.
- Create data pipelines to capture, process, and store experimental design and data from the lab.
- Manage DevOps and cloud infrastructure for the engineering team.
- Design schemas that allow for efficient storage and retrieval of data.
- Create tools that enable the company to turn data into actionable knowledge.
- Collaborate with laboratory and data scientists to enable analytics and reporting of scientific data.
- Collaborate with software and machine learning engineers to enable quick and easy consumption of data.
- You write clean, modular, and maintainable code.
- You are a continual learner and drive innovation by understanding new frameworks and technologies.
- You are a self-starter, comfortable taking initiative without direct supervision.
- Excellent communication and stakeholder management skills with the ability to relay technical information to non-technical audiences.
- You expect your work to be meaningful and strive to be part of a business dedicated to having a positive impact on the planet.
- BS/MS in Computer Science or equivalent experience/training
- 3+ years of experience building production data pipelines
- Extensive expertise in Python
- Experience working with distributed datasets (Spark, Dask)
- Expertise in containerization: Docker and scalable Kubernetes Clusters
- Expertise in SQL
- Experience in AWS ecosystem with particular focus on Batch, ECS, and EKS
- Experience with workflow managers (e.g. Prefect, Airflow, Luigi, Snakemake, Mage, etc…)
- Experience with testing and CI/CD frameworks
- Proficiency with Unix, Git, and other command-line tools
- Familiarity with genomics and/or proteomics
- Experience processing large protein databases (e.g. Uniprot)
- Experience with Terraform, Prefect, Spot.io, or Snowflake