Enterprise Platform on Bare Metal: 84 Apps, 3 Nodes, Zero Cloud Bills

How we built an enterprise-grade Kubernetes platform on 3 Intel NUCs with full CNCF stack

Enterprise Platform on Bare Metal: 84 Apps, 3 Nodes, Zero Cloud Bills

The Challenge

Build a production-grade platform that rivals enterprise cloud deployments — on three Intel NUC mini-PCs. No cloud provider, no managed services, no monthly bills. Full GitOps, full observability, full security.

The Solution

We deployed a bare-metal Kubernetes cluster using Cluster API with Tinkerbell for machine provisioning. The entire platform is managed through a single GitOps repository with ArgoCD, spanning 84 applications across 15 categories.

Infrastructure Layer

  • Cluster Topology: 3 control plane nodes running untainted — serving as both control plane and workers for maximum resource utilization
  • Bare Metal Provisioning: Cluster API + Tinkerbell for automated node provisioning
  • Networking: Cilium CNI, MetalLB for load balancing, Istio service mesh
  • Storage: Rook-Ceph distributed storage across all three nodes (1.5TB NVMe, 42% utilized)
  • Virtualization: KubeVirt for running VM workloads alongside containers, with dedicated management dashboard
  • DNS & TLS: External-DNS, Cert-Manager with Let’s Encrypt, Traefik ingress

Platform Layer

  • GitOps: ArgoCD with app-of-apps pattern, self-healing enabled
  • CI/CD: Argo Workflows + Argo Events for event-driven pipelines
  • Progressive Delivery: Argo Rollouts, Kargo for environment promotion
  • Developer Portal: Backstage for service catalog and golden paths
  • Artifact Registry: Gitea (Git) + Harbor (container images)

Security Layer

  • Identity: Keycloak SSO integrated with 15+ services via OIDC
  • Secrets: HashiCorp Vault + External Secrets Operator
  • Runtime: Falco threat detection, Kyverno policy enforcement
  • Supply Chain: Trivy scanning, Harbor vulnerability checks

Observability Layer

  • Metrics: Mimir for scalable, multi-tenant metrics storage (Prometheus-compatible) + Grafana
  • Logs: Loki with S3-backed storage
  • Traces: Tempo for distributed tracing
  • Telemetry Collection: Grafana Alloy as the unified collector for metrics, logs, and traces
  • Profiling: Pyroscope for continuous profiling
  • Error Tracking: Sentry with ClickHouse backend
  • Service Mesh: Kiali for Istio observability

AI/ML Platform

  • LLM Gateway: LiteLLM routing to multiple AI providers
  • Observability: Langfuse for LLM tracing and evaluation
  • Chat Interface: Open WebUI for model interaction
  • Local Inference: NVIDIA DGX Spark running Qwen models — zero-latency, zero-cost local LLM serving
  • Agent Platform: 18 AI agents with orchestration layer
  • Automation: n8n workflows, NATS messaging

Collaboration

  • Chat: Synapse (Matrix) for team communication
  • Files: Nextcloud for document management
  • Automation: n8n for workflow automation

The Results

  • 84 applications deployed and managed via GitOps
  • Zero cloud bills — entire platform on 3x Intel NUC hardware
  • 99.9% uptime with automated self-healing (ArgoCD, VPA, kured)
  • 30+ Ceph-backed persistent volumes across workloads
  • 15+ services with SSO via Keycloak
  • Full CNCF stack — no vendor lock-in
  • Local AI inference via NVIDIA DGX Spark — zero API costs for development and testing
  • Single git repo manages everything — infrastructure to applications

Want similar results?

Let's talk about how we can help modernize your infrastructure.

Get In Touch