Erstellt am 15. Mai 2026

Senior AI Platform Engineer

eMFusion Global

Berlin, Berlin 10115, Germany Vollzeit

Reference: 810973689

About the Role

We are working with a leading international consultancy that is building scalable, production-grade AI SaaS products within their dedicated AI Lab. This is a greenfield opportunity - you will combine deep technical expertise with strategic vision to design and build AI-powered platforms that transform enterprise clients' business models.

The AI Lab is developing cutting-edge, large-scale AI products delivering sustained commercial impact. The team operates with a startup mindset: agile, flat hierarchies, and a genuine bias for experimentation and ownership.

The Opportunity

This is a rare full-stack platform engineering role that spans infrastructure architecture through to LLM operationalisation. You will own the platform layer end-to-end - from Kubernetes cluster operations and IaC through to model serving, RAG pipelines, and LLMOps.

Key themes of the role:

Design and evolve a multi-tenant SaaS architecture with tenant isolation, per-tenant controls, and enterprise security
Build automated tenant provisioning, safe rollouts (canary/feature flags), and noisy-neighbor protection
Operationalise LLMs end-to-end - fine-tuning, evaluation, high-performance serving, monitoring, and embeddings workflows
Drive MLOps foundations: automated training pipelines, experiment tracking, and scalable model deployment
Manage Kubernetes clusters, GPU-heavy workloads, and autoscaling on AWS
Build unified CI/CD pipelines shipping ML and application code seamlessly
Implement comprehensive observability: logs, metrics, traces, model/data drift detection
Embed enterprise security and compliance - IAM, RBAC, VPC design, secrets management, encryption - at every layer
Design well-architected ETL/ELT pipelines, streaming systems, feature store integration, and workflow orchestration

Technical Requirements

Platform & Multi-Tenancy

Proven patterns for tenant isolation (DB-per-tenant, schema-per-tenant, row-level security), tenant-aware caching, noisy-neighbor protection
OIDC/OAuth2, tenant-aware RBAC/ABAC, SCIM provisioning, and audit logging for B2B SaaS

Kubernetes & Infrastructure

Deep Kubernetes: cluster ops, HPA/VPA, node pools, GPU scheduling, Karpenter, PDBs, network policies, multi-AZ design
Service mesh (Istio/Linkerd), ingress patterns (ALB/Nginx), secure egress, mTLS
Infrastructure as Code beyond basics: Terraform modules, Terragrunt, policy-as-code (OPA/Conftest), secrets automation
GitOps (ArgoCD/Flux), progressive delivery (Argo Rollouts/Flagger), feature flags, canary and blue/green deployments

MLOps & Model Lifecycle

Model lifecycle tooling: MLflow/W&B, model registry, experiment tracking, reproducible training, dataset versioning (DVC/lakeFS)
Pipeline orchestration: Airflow, Prefect, or Dagster + artifact stores
Model serving: KServe, Seldon, BentoML, or Ray Serve - online, async/batch inference, autoscaling, rollback

LLMOps

Prompt and version management, offline + online evaluation harnesses, RAG evaluation (retrieval metrics, groundedness), guardrails, red-teaming basics
Streaming inference (SSE/WebSockets), caching, routing, fallback models
Vector DB experience: pgvector, Pinecone, Weaviate, or Milvus - embedding lifecycle, backfills, re-embedding, indexing strategies

Observability & Security

OpenTelemetry, tracing, SLOs - Prometheus/Grafana, Loki/ELK, Datadog/New Relic
Incident management: postmortems, runbooks, error budgets
GDPR, encryption at rest/in transit, secrets management (AWS Secrets Manager/Vault), KMS, key rotation
SOC 2 / ISO 27001 familiarity, vulnerability scanning (Trivy/Grype), SBOMs, SAST/DAST

About You

You have shipped and operated customer-facing SaaS products at scale with real users
You have owned end-to-end ML/AI infrastructure - from data ingestion through to production monitoring
You enable engineers and data scientists to move faster through self-service platforms and automated workflows
You have a track record of designing systems that scale globally across regions and traffic patterns
You are comfortable with incident response, on-call rotations, and stabilising critical production systems
You think with a product mindset - customer value, reliability, and speed-to-market over technology for its own sake
You have a strong bias for automation and eliminating manual operational toil
Excellent communication skills - async collaboration, documentation, and explaining technical decisions to non-technical audiences

What's on Offer

Genuine greenfield platform engineering ownership - build it from scratch
Startup atmosphere with flat hierarchies within a globally established firm
Hybrid working, international mobility across a wide office network
Extensive learning and development programmes
Competitive package including bonus

Jetzt online bewerben

Senior AI Platform Engineer

Jobbenachrichtigungen per Newsletter erhalten

Dieses Jobangebot teilen