Data Engineer II

University of Texas at Austin•Austin, TX

3d•Hybrid

About The Position

Data Engineer II is an experienced data professional responsible for designing, building, and maintaining robust data pipelines and infrastructure that enable the collection, storage, and processing of large datasets. This role expands upon the Data Engineer I position by handling more complex data projects and working with greater independence. A Data Engineer II ensures data is accurate, secure, and compliant with data governance standards. A Data Engineer II collaborates with cross-functional teams (e.g., business stakeholders, IT, and subject-matter experts) to deliver solutions that meet business and research needs.

Requirements

Requires a Bachelor's Degree in Computer Science, Information Systems, Data Science, or a related field (required).
An equivalent combination of relevant education and experience may be considered in lieu of a four-year degree with at least 4 year(s) of experience in data engineering or a closely related field. This experience should include designing data architectures, developing data pipelines, and implementing data quality/performance monitoring.
Proven track record in database development using Python and SQL (including experience with NoSQL databases) is expected.
Systems Knowledge: Broad understanding of system-level concepts in computing. This includes knowledge of programming and scripting, operating systems, database query languages (SQL) and data mining techniques, as well as familiarity with IT infrastructure (servers, networking, cloud services). Such knowledge enables the Data Engineer II to troubleshoot and optimize across the technology stack.
Big Data Processing: Proficiency with big data frameworks such as Apache Spark for distributed data processing and large-scale computations. Experience optimizing Spark jobs for performance is often required.
Workflow Orchestration: Experience with workflow orchestration tools like Apache Airflow (or similar platforms) to schedule and manage complex data pipelines. Ability to design reliable job workflows and handle dependencies between tasks.
Programming & Databases: Strong programming skills in Python (especially using PySpark) and solid knowledge of SQL for querying and manipulating data. Familiarity with working in both relational databases (SQL) and NoSQL databases, with the ability to design and optimize database schemas and queries for each.
Version Control: Experience using Git or other version control systems for managing codebases and collaborating on data projects. Follows best practices in code versioning and documentation to maintain a clear history of changes.
Cloud Data Pipelines: Hands-on experience building data pipelines on cloud or modern data platforms. This could include using services in Microsoft Fabric (e.g., Azure Data Factory within Fabric) or similar ETL tools to move and transform data at scale. Knowledge of cloud ecosystems and services for data processing (such as AWS Glue or Azure Synapse pipelines) is beneficial.
Data Warehousing: Familiarity with cloud-based data warehousing and analytics services such as Google BigQuery, Microsoft Fabric (Synapse Analytics), or AWS Redshift for storing and querying large datasets. Ability to optimize data models and SQL queries on these platforms to ensure fast performance and cost-efficiency.
Technical Learning Quickly grasps technical concepts and applies them effectively. Learns new tools and platforms independently. Applies new techniques to improve data pipelines. Shares technical knowledge with peers.
Problem Solving Uses logic and data to solve complex problems effectively. Diagnoses root causes of data issues. Designs scalable solutions. Anticipates and mitigates risks.
Action Oriented Takes initiative and acts with urgency Proactively addresses data quality issues Suggests improvements without being prompted Delivers results under tight deadlines
Collaboration Works effectively with others to achieve shared goals. Communicates clearly with non-technical stakeholders. Participates in cross-functional teams. Resolves conflicts constructively.
Planning and Organizing Prioritizes tasks and manages time effectively. Breaks down complex projects into manageable steps. Tracks progress and adjusts plans as needed. Meets deadlines consistently.

Nice To Haves

Master's Degree in Computer Science, Data Engineering, Informatics, or a related field with at least 7 year(s) of experience in healthcare data engineering or enterprise data systems.
Domain Expertise: (If applicable) Experience working with healthcare or clinical data is highly valuable. For example, familiarity with electronic health record (EHR) systems and clinical registries, experience using tools like REDCap for data capture, or involvement in healthcare analytics projects. Ability to create quality/outcome reports and develop data visualizations for non-technical stakeholders is a plus.
Microsoft Certified: Azure Data Engineer Associate
Google Cloud Professional Data Engineer
AWS Certified Data Analytics – Specialty
Analytical Skills: Knowledge of statistics and experience with statistical or data analysis software or Python libraries for data science. This background helps in understanding data trends and supporting data scientists or analysts in the organization with more advanced analytics needs.

Responsibilities

Maintains and optimizes data pipeline architecture by designing, building, and managing ETL processes that extract, transform, and load data from diverse sources.
Assembles large, complex data sets to meet both functional and non-functional requirements, and develops scalable architectures for structured and unstructured data.
Integrates and consolidates data from multiple systems—such as disparate databases and electronic health records—into unified repositories like data warehouses or data lakes.
Develops and enhances the underlying data infrastructure using SQL and cloud technologies to ensure scalability and reliability.
Creates and supports analytics tools that empower analysts and data scientists to access and analyze data efficiently.
Builds custom queries, scripts, and dashboards that enable insight generation and data product optimization.
Collaborates with analytics experts to organize, query, and visualize data for reporting and research.
Identifies and implements process improvements to enhance data operations.
Automates manual workflows, optimize data delivery pipelines, and redesign system architecture to support scalability and performance.
Continuously evaluates workflows and technologies to recommend improvements that accommodate growing data complexity.
Ensures data governance and security by validating data for accuracy and consistency, and maintaining secure, compliant data environments.
Follows best practices and regulatory standards (e.g., HIPAA) to protect sensitive information and uphold data integrity.
Collaborates with stakeholders across departments—including executives, product managers, researchers, and designers—to address data infrastructure needs and resolve technical issues.
Translates non-technical requirements into effective data solutions and advises on best practices for data architecture.
Manages and executes data projects from planning through deployment.
Applies light project management techniques to coordinate tasks, communicates with team members, and ensures timely delivery.
Exercises independent judgment to overcome obstacles and align project outcomes with organizational goals.
Adheres to internal controls and reporting structure.
Performs related duties as required.