wagey.ggwagey.gg
Open Tech JobsCompaniesPricing
Log InGet Started Free
Jobs/Machine Learning Engineer Role/Machine Learning Engineer — Inference Optimization

Machine Learning Engineer — Inference Optimization

Featherless AIRemote - (world)+ Equity1mo ago
RemoteWWArtificial IntelligenceMachine Learning EngineerCUDATritonONNX

Upload My Resume

Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• Strong experience in ML inference optimization or high-performance ML systems • Solid understanding of deep learning internals (attention, memory layout, compute graphs) • Hands-on experience with PyTorch (or similar) and model deployment • Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations) • Experience scaling inference for real users (not just research benchmarks) • Comfortable working in fast-moving startup environments with ownership and ambiguity • Experience with LLM or long-context model inference • Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton) • Experience optimizing across different hardware vendors • Open-source contributions in ML systems or inference tooling • Background in distributed systems or low-latency services

Responsibilities

• Optimize inference latency, throughput, and cost for large-scale ML models in production. • Profile and bottleneck GPU/CPU inference pipelines including memory usage, kernel executions, batching strategies, and input/output operations. • Implement and tune quantization techniques such as fp16, bf16, int8, and fp8 to reduce model size and improve performance. • Optimize KV-cache for reuse in inference systems. • Apply speculative decoding strategies along with batching and streaming optimizations. • Perform model pruning or architectural simplifications specifically tailored for the purpose of inference efficiency. • Collaborate closely with research engineers to translate new model architectures into production environments, ensuring they are fast and reliable enough for real user interaction. • Build and maintain robust systems capable of serving ML models (e.g., Triton server or custom runtimes) that can handle various hardware configurations like NVIDIA/AMD GPUs as well as cloud infrastructures. • Benchmark performance across different types of hardware setups, including but not limited to specific GPU and CPU brands from vendors such as NVIDIA and AMD, along with diverse cloud environments. • Enhance system reliability by improving observability features under actual workload conditions. • Work towards optimizing the cost efficiency of inference operations within realistic user scenarios without compromising on performance or accuracy.

Benefits

• Real ownership over performance-critical systems • Direct impact on product reliability and unit economics • Close collaboration with research, infra, and product • Competitive compensation + meaningful equity at Series A • A team that cares about engineering quality, not hype

Similar Jobs

Senior Machine Learning Engineer (Ops)19h ago
Gather AIGather AI·Remote - India
RemoteAPACSeniorCloud ComputingArtificial IntelligenceMachine Learning EngineerMLOpsDockerKubernetesPythonAirflowMLflowKubeflowTerraformPrefect
Machine Learning Engineer19h ago
Judi HealthJudi Health·Remote - Denver, Colorado, United States; Remote·$1.2M – $1.2M/year
RemoteNAMidArtificial IntelligenceSoftwareMachine Learning EngineerPythonLearning & DevelopmentClaudeReportingQuality Assurance
Machine Learning Engineer20h ago
RedditReddit·Remote - USA·$186k – $303k/year + Equity
RemoteNAMidArtificial IntelligenceData AnalyticsMachine Learning EngineerJavaGoPythonTransformersXGBoostKafkaRedisAirflowRay
Senior Machine Learning Engineer20h ago
DatatonicDatatonic·Remote - UK·$92k – $125k/year
RemoteEMEASeniorCloud ComputingArtificial IntelligenceMachine Learning EngineerSolutions ArchitectSQLMLOpsPythonAWSAzureFlask
Principal Machine Learning Engineer23h ago
facultyfaculty·London
In OfficeEMEAPrincipalFintechArtificial IntelligenceMachine Learning EngineerPrincipalExcel

Stop filling. Start chilling.Start chilling.

Get Started Free

No credit card. Takes 10 seconds.

© 2026 Dominic Morris. All rights reserved.·Privacy·Terms·