wagey.ggwagey.gg
Open Tech JobsCompaniesPricing
Log InGet Started Free
Jobs/Machine Learning Engineer Role/Machine Learning Engineer — Training Optimization

Machine Learning Engineer — Training Optimization

Featherless AIRemote (world) - Hybrid+ Equity1mo ago
In OfficeWWArtificial IntelligenceMachine Learning EngineerCustomer TrainingClose

Upload My Resume

Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• Strong experience training large neural networks (LLMs or similarly large models) • Hands-on experience with training optimization (not just model usage) • Solid understanding of: • Backpropagation, optimization algorithms, and training dynamics • Distributed systems for ML training • Comfort working close to hardware (GPUs, memory, networking constraints) • Ability to move fluidly between research ideas and production-ready code • Experience with large-scale distributed training (multi-node, multi-GPU) • Familiarity with DeepSpeed, FSDP, Megatron, or custom training stacks • Experience optimizing training on AMD or NVIDIA GPUs • Contributions to open-source ML infrastructure or research codebases • Exposure to non-Transformer architectures (RNNs, hybrid models, etc.)

Responsibilities

• Optimize large-scale model training pipelines (throughput, convergence, stability, and cost) • Improve distributed training strategies (data, model, and pipeline parallelism) • Tune optimizers, schedulers, batch sizing, and precision (bf16 / fp16 / fp8) • Reduce training time and compute cost via profiling, bottleneck analysis, and systems-level improvements • Collaborate with researchers on architecture-aware training strategies • Build and maintain robust training infrastructure (checkpointing, fault tolerance, reproducibility) • Evaluate and integrate new training techniques (e.g. gradient checkpointing, ZeRO, FSDP, custom kernels) • Own training performance metrics and continuously push them forward

Benefits

• Real ownership at Series-A stage — your work shapes the company’s trajectory • Work on cutting-edge models and training systems at scale • Small, highly technical team with fast feedback loops • Strong emphasis on engineering quality and research rigor • Competitive compensation + meaningful equity

Similar Jobs

Senior Machine Learning Engineer (Ops)19h ago
Gather AIGather AI·Remote - India
RemoteAPACSeniorCloud ComputingArtificial IntelligenceMachine Learning EngineerMLOpsDockerKubernetesPythonAirflowMLflowKubeflowTerraformPrefect
Machine Learning Engineer19h ago
Judi HealthJudi Health·Remote - Denver, Colorado, United States; Remote·$1.2M – $1.2M/year
RemoteNAMidArtificial IntelligenceSoftwareMachine Learning EngineerPythonLearning & DevelopmentClaudeReportingQuality Assurance
Machine Learning Engineer20h ago
RedditReddit·Remote - USA·$186k – $303k/year + Equity
RemoteNAMidArtificial IntelligenceData AnalyticsMachine Learning EngineerJavaGoPythonTransformersXGBoostKafkaRedisAirflowRay
Senior Machine Learning Engineer20h ago
DatatonicDatatonic·Remote - UK·$92k – $125k/year
RemoteEMEASeniorCloud ComputingArtificial IntelligenceMachine Learning EngineerSolutions ArchitectSQLMLOpsPythonAWSAzureFlask
Principal Machine Learning Engineer23h ago
facultyfaculty·London
In OfficeEMEAPrincipalFintechArtificial IntelligenceMachine Learning EngineerPrincipalExcel

Stop filling. Start chilling.Start chilling.

Get Started Free

No credit card. Takes 10 seconds.

© 2026 Dominic Morris. All rights reserved.·Privacy·Terms·