AtScale Jobs

Manager, Sustaining Engineering

Department
Engineering
Location
Remote (U.S.) – Boston, MA preferred

Company Overview

AtScale is the universal semantic layer for modern data and AI. We help enterprises build governed, AI-ready analytics by connecting business users and AI systems to a single, consistent source of metrics and business logic across cloud data platforms like Snowflake, Databricks, Google BigQuery, and others.

Job Description

This role owns the engineering response to high priority customer-impacting issues, operational defects, and technical debt, ensuring issues are resolved correctly and permanently—not just patched.

This is a hands-on engineering leadership role that sits at the intersection of Product Engineering, QA, SRE, and Technical Support. The Sustaining Engineering Manager is accountable for reducing customer pain, improving release confidence, and protecting roadmap velocity by preventing recurring issues.

Responsibilities

  • Own the sustaining engineering response for P0–P1 customer-impacting issues
  • Lead root cause analysis (RCA) and ensure durable fixes are delivered
  • Partner with SRE and Support during incidents to restore service quickly and safely
  • Drive post-incident reviews and ensure action items are completed
  • Identify recurring issues and systemic weaknesses in the platform
  • Work with Product and Engineering to prioritize fixes that reduce customer pain and support load
  • Champion improvements in test coverage, observability, performance, and operational readiness
  • Ensure fixes meet engineering standards for quality, performance, and security
  • Build, manage, and mentor a distributed Sustaining Engineering team
  • Establish clear ownership, on-call practices, and escalation paths
  • Balance reactive work with proactive investments in stability and platform health
  • Foster a culture of accountability, ownership, and continuous improvement
  • Define and maintain clear operating boundaries between Technical Support, Sustaining Engineering, and Product Engineering
  • Partner with Product to influence roadmap decisions based on production learnings
  • Collaborate with QA on regression prevention and release readiness
  • Communicate clearly with stakeholders during incidents and escalations
  • Define and track KPIs such as P0/P1 incident frequency and TTR, defect recurrence rate, support ticket volume tied to product defects, post-release defect rates
  • Use data to drive prioritization and justify proactive investment

Requirements

  • 10+ years of professional software engineering experience
  • 5+ years of engineering management experience
  • Strong background in production systems, incident management, and debugging complex distributed systems
  • Experience running on-call rotations and incident response processes
  • Proven ability to lead teams through high-pressure situations
  • Excellent written and verbal communication skills

Preference will be given to candidates with

  • Familiarity with OLAP concepts and technologies such as SSIS and SSAS
  • Familiarity with business intelligence tools (e.g. Tableau, PowerBI, Excel)
  • Experience with technologies such as Data Analysis Expressions (DAX) and Multidimensional Data Expressions (MDX)

What We Offer

  • Competitive compensation, including equity.
  • Flexible, remote-friendly work environment with a strong culture of ownership and trust.
  • Unlimited PTO and competitive benefits.
  • The opportunity to directly shape AtScale’s growth by building the team that powers our next phase.

Join a team of passionate people committed to redefining the way business intelligence and AI is done. For additional information, visit www.atscale.com.

 

Think you’d be a great fit?
We’d love to hear from you.