Software Engineering Intern – Data Engineering (Cloud & Streaming)

Nextpower•Fremont, CA

2d•$40 - $55

About The Position

Nextpower is seeking a highly motivated summer intern to join the Data Engineering team to help build and operate production data pipelines that power analytics and operational insights from power plant telemetry. This role will report to a data engineering leader and will focus on developing reliable, cost-efficient streaming and lakehouse pipelines on Azure. This internship offers direct, hands-on experience delivering real production data systems—working with streaming ingestion, Delta Lake lakehouse patterns (Bronze/Silver/Gold), and data quality engineering. Interns will work closely with experienced engineers and stakeholders, receive meaningful mentorship, and gain exposure to how data products are designed, delivered, and supported in production.

Requirements

Currently pursuing a Bachelor’s or Master’s degree in Computer Science, Data Engineering, Software Engineering, or a related field (or equivalent practical experience)
Proficiency in Python and/or SQL, with the ability to write clean, maintainable code
Strong understanding of data fundamentals (schemas, data modeling basics, transformations, and data reliability concepts)
Exposure to distributed systems and/or big data concepts (e.g., Spark, streaming, or parallel processing)
Strong analytical and troubleshooting skills; ability to debug issues using logs, metrics, and data inspection
Excellent written and verbal communication skills, including comfort documenting work clearly
Comfortable working in a fast-paced, high-performance environment with mentorship and feedback

Nice To Haves

Experience with Azure (Event Hubs, ADLS Gen2) or cloud data platforms (AWS/GCP equivalents)
Experience with Databricks, Delta Lake, or lakehouse architectures
Exposure to streaming (Kafka or similar) and event-driven data pipelines
Familiarity with data quality practices (tests, validation, data contracts) and CI/CD

Responsibilities

Build and operate telemetry data pipelines from ingestion through curated lakehouse tables
Implement streaming ingestion using Kafka concepts (e.g., Azure Event Hubs or similar technologies)
Develop Databricks lakehouse pipelines producing Bronze/Silver/Gold Delta tables (or equivalent patterns)
Enforce schema validation, data contracts, and data quality checks (e.g., expectations/tests)
Handle real-world pipeline issues such as late-arriving data, duplicates, out-of-order events, and schema drift
Optimize pipelines for performance, reliability, and cost
Create clear documentation and runbooks to support owned pipelines in production
Collaborate with engineers and stakeholders to translate requirements into robust, maintainable data products

Benefits

Hands-on experience building and supporting production-grade data pipelines for real telemetry systems
Practical exposure to Azure-based data platform architecture and streaming fundamentals
Experience implementing lakehouse design principles using Bronze/Silver/Gold patterns (or equivalent)
Real-world data reliability engineering skills: handling late data, duplicates, schema evolution, and correctness
Mentorship from experienced engineers and insight into how data engineering supports business and operational outcomes
Strong portfolio-worthy deliverables: production pipeline components, documentation, and measurable improvements

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume