DevOps Engineer
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• 5+ years SRE, DevOps, or production operations experience. • Proven experience operating and scaling production systems with uptime and latency goals. • Strong hands-on experience with observability stacks (Datadog, Sentry, or similar). • Experience defining SLOs/SLIs and building effective alerting strategies. • Proficiency with CI/CD systems and infrastructure-as-code. • Experience with cloud-native and serverless platforms (GCP, AWS). • Strong cross-system debugging and incident response skills. • Experience with multi-product distributed cloud tracing (stitching together Datadog, Sentry, GCP tracing). • Familiarity with containerized and serverless workloads (Cloud Run, Firebase, Lambda). • Experience supporting large TypeScript monorepos and build tooling (PNPM). • Background supporting AI-powered systems or high-variance workloads. • Startup experience or ownership of systems through rapid growth. • Impact-oriented: You don't feel done until real people are getting real value from what you built. • Ambiguity-native: You thrive in the undefined. Our work is full of half-mapped terrain, soft constraints, and ideas that shift under your feet. That energizes you. You constantly update your intuitions as you go, and are excited to discover new and better ways to attack challenges we're still finding words to describe. • Collaborative: You share your thoughts early and often, and welcome debate and creative collaboration. Flux is a deeply collaborative company, and we believe that the best ideas can only win if they're said out loud. • Convention-averse: Flux is an AI-first company, building AI tooling, using AI tooling. We constantly experiment with new tools, techniques, and processes. You feel an urgency to reimagine what your work looks like in this rapidly changing world, and you value critical thinking far above established conventions. • Ownership mentality: You are a self-starter, bias toward action, and care deeply about the team and community who lean on you and your work.
Responsibilities
• Improve the reliability, availability, and operational health of production systems. • Set observability standards across services (metrics, logs, errors). • Set SLOs/SLIs, alerting, and on-call readiness with a focus on signal quality. • Partner with engineers to design resilient systems and reduce operational risk early. • Build internal tooling that improves system safety, debugging, and developer velocity. • Manage infrastructure via Pulumi across GCP, AWS, and Firebase.
Benefits
• Equity options mentioned as part of the compensation package. • Paid time off (PTO) benefits are included in the job posting. • Insurance coverage is provided to employees. • Perks such as flexible work hours or additional vacation days, if any specific perks were stated would be listed here; however, none of these particulars have been mentioned explicitly. • Remote work options available for suitable positions within the company are offered in this job posting.