Remote, US

Apply now Back to search
Data Engineer

Skill/Experience/Education:

Mandatory:
Bachelor's degree in a quantitative field or equivalent experience - 2+ years of direct experience developing data pipelines in a production context - Understanding + experience of modern software practices such as version control systems, CI/CD workflows, test driven development - Strong SQL skills as applied to analytics context - Proficient with Python - Tenacious Problem Solver/Troubleshooter - Strong communicator – emphasis on data storytelling - Meticulous attention to detail - Ability to earn trust, maintain positive and professional relationships, and contribute to a culture of inclusion

Desired Skills:
Experience with running & scaling Apache Airflow in production, especially to support self-service analytics workflows - Experience with AWS Services, especially Redshift, RDS, EMR, Lambda - Experience ingesting data from a wide variety of sources including NoSQL databases, REST endpoints, etc. - Experience with near-real-time data streaming applications - Experience with big data processing frameworks such as Apache Spark - Knowledge and direct experience with Docker and container orchestration frameworks such as Kubernetes - Knowledge and experience with Linux - Familiar with Infrastructure as Code and relevant technologies - Familiarity with data warehousing concepts from the Kimball Group - Familiar with the manufacturing, PLM tools, and supply chain operations - Experience with hierarchical data structures and recursive processing techniques - Direct experience with Database Administration – emphasis OLAP systems (especially Redshift) – configuring, deploying, scaling, performance tuning, troubleshooting - Direct experience with delivering results on a Data Lake project - Experience with implementing Data Quality monitoring and alerting systems

Description:
Responsible for performing the full development lifecycle for automating batch and stream data processing systems, ensuring high-performance, resiliency, and observability 
Monitoring, maintaining, and improving data pipelines and associated infrastructure, facilitating the greatest possible utilization and trust of data assets 
Working with upstream system administrators and downstream business customers to onboard data, ensuring high quality, discoverability, ease of use 
Establishing ground rules, providing code reviews and upholding the standards for code quality in our stack 
Creating tools and workflows that empower rapid iteration and repeatable results, and evangelizing this toolset with our Analytics Community 
Participating in a shared on-call rotation monitoring the health of the team's systems 
Creating essential documentation covering usage, maintenance, and troubleshooting of our data platform components 
Practicing sustainable incident response and blameless postmortems

 
Apply now Back to search