Data Engineer (Contract, South San Francisco, CA)



Genentech is looking for a Data Engineer in South San Francisco, CA. It is a 12+ months contract on W2.

Please email your Resume attached as a Word Doc along with your desired Hourly Rate, your current location, visa status and date you are available to start a new position. Please make certain your resume clearly details your experience with the skills required for this requirement.

Also please provide a brief writeup about why you are a match for the position that can be passed along to the manager. If you have worked at Roche or Genentech in the past please provide your previous managers names, so they too can be passed along to the hiring manager.


The Genentech Research and Early Development (gRED), Early Clinical Development Informatics (ECDi) department is seeking an experienced Data Engineer who will be responsible for designing, developing and optimizing ETL / data pipelines to support a variety of machine learning, predictive analytics, systems and BI solutions in support of the organization's goals to digitize and optimize clinical trials.  This individual will work within ECDi's Information Management Office (IMO).

The role will require cross-functional interactions with Data Management Leads, Predictive Analytics Analysts, Artificial Intelligence Scientists and Information Technology teams across multiple projects to implement data solutions in ECDi's data lake and data warehouse called gCORE.  The hallmark of a great candidate is one who can translate the unique needs of a diverse set of stakeholders and requirements across both the data lake and data warehouse use cases and is eager to solve complex data challenges selecting the best fit solution.  Must be self-motivated, passionate about data management and analytics and able to extrapolate customer needs with minimal direction.


  • Understand the current state data landscape, use cases and existing data lake and data warehouse setup
  • Work with Business Analysts, Data Analysts, Data Scientists and AI Engineers to identify infrastructure and data roadmap needs and propose the appropriate strategy in partnership with other IMO engineers
  • Assemble large, complex data sets in the format fit for each use case
  • Architect, develop and optimize ETL pipelines using Python, Spark, EMR, Docker and Airflow
  • Develop and optimize big data pipelines for data scientists (requires a basic understanding of data science concepts and ML)
  • Write generic Python/Pyspark modules for processing data from various data sources (XML, Parquet, CSV, Relational)
  • Hands on physical and logical database design and modeling in the context of data warehousing (currently using AWS Redshift)
  • Perform hands-on infrastructure design of ECD's AWS data lake and data warehouse environment (gCORE) including continuous exploration and recommendation of new technologies and best practices
  • Research and recommend new innovative methods and systems to manage data for business improvement
  • Participate in internal governance to drive the data quality business cycle and roadmap


  • 5+ years of programming experience (including functional programming); must be advanced in Python
  • 3+ years experience designing, building and maintaining production data pipelines and/or data warehouses
  • Demonstrable experience working with different database types including columnar data stores, SQL and graph based and the ability to select the right tool for the right job
  • Experience building and optimizing big data pipelines using Spark
  • Experience with AWS cloud services: S3, EC2, EMR, RDS, Redshift, Lambda, EKS
  • Solid understanding of how to design robust data workflows including optimization and user experience
  • Strong analytical and problem-solving skills
  • Excellent oral and written communication skills
  • Able to work in teams and collaborate with others to clarify requirements
  • Strong co-ordination and project management skills to handle complex projects
  • Experience developing and working with XML, JSON, and external web services
  • Education: Bachelor's or Master's degree in computer science or software engineering

Preferred Qualifications

  • Clinical drug development domain knowledge
  • Experience working with clinical and biomedical data types (clinical patient data, omics, imaging, etc.)
  • Competencies in applied statistics to solve business needs
  • Knowledge of industry data standards used in drug development, particularly in Clinical development

Required Skills

  • Artificial Intelligence
  • Clarify
  • Data Management
  • Data Quality
  • Data Science 


Please review all application instructions before applying.

Apply Now