Storage Architecture
Overview
The cluster uses Ceph CSI to connect Kubernetes directly to the Proxmox Ceph cluster:
- RBD (
ceph-block) for block volumes (RWO)
- CephFS (
ceph-filesystem) for shared filesystem volumes (RWX)
Architecture
flowchart TB
subgraph "Kubernetes Cluster"
direction LR
prom["Prometheus<br>(50Gi)"] --> csi["Ceph CSI<br>(RBD + CephFS)"]
graf["Grafana<br>(10Gi)"] --> csi
am["Alertmanager<br>(5Gi)"] --> csi
end
subgraph "Proxmox Ceph Cluster"
direction TB
pves["pve1 / pve2 / pve3 / pve4<br>MON + OSD"]
pool["Pool: kubernetes<br>Replication: 2x<br>PGs: 32"]
fsid["FSID: eb53e78d-4b17-4e8c-8186-cd82025a8917"]
pves --> pool
pves --> fsid
end
csi --> pves
Storage Classes
| Storage Class |
Backend |
Access Mode |
Default |
ceph-block |
Ceph RBD |
RWO |
Yes |
ceph-filesystem |
CephFS |
RWX |
No |
Ceph Cluster Details
Proxmox Ceph
| Component |
Details |
| FSID |
eb53e78d-4b17-4e8c-8186-cd82025a8917 |
| Monitors |
172.16.1.2, .3, .4, .5 (port 6789) |
| OSDs |
5 OSDs across 4 nodes |
| Total Capacity |
~3.6 TiB |
Kubernetes Pool
| Setting |
Value |
| Pool Name |
kubernetes |
| PG Count |
32 |
| Replication |
2x |
| User |
client.kubernetes |
Current PVCs
The largest stateful PVCs live in monitoring (Prometheus/Grafana/Alertmanager) on ceph-block. Many application config PVCs use ceph-filesystem.
| Namespace |
PVC |
Size |
Workload |
| monitoring |
prometheus-db |
50Gi |
Metrics storage (7d retention) |
| monitoring |
grafana |
10Gi |
Dashboards & settings |
| monitoring |
alertmanager-db |
5Gi |
Alert state |
Using Persistent Volumes
Basic PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
# storageClassName: ceph-block # Default
Shared Storage (RWX)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-data
spec:
accessModes:
- ReadWriteMany
storageClassName: ceph-filesystem
resources:
requests:
storage: 100Gi
NFS Mounts
For media storage, containers mount NFS from the NAS:
volumes:
- name: media
nfs:
server: 172.16.1.250
path: /volume2/hulk/media
Backup Strategy
Velero (Deployed)
Velero backs up:
- Kubernetes resources (YAML)
- Persistent Volumes (snapshots)
Manual Backup
# Export all resources
kubectl get all -A -o yaml > cluster-backup.yaml
# Backup etcd
talosctl -n 172.16.1.50 etcd snapshot db.snapshot
Storage Recommendations
| Workload |
Storage Class |
Access Mode |
| Databases |
ceph-block |
RWO |
| Prometheus |
ceph-block |
RWO |
| Shared configs |
ceph-filesystem |
RWX |
| Media files |
NFS |
RWX |
| Temporary |
emptyDir |
- |
Troubleshooting
PVC Stuck Pending
# Check storage class
kubectl get sc
# Check PVC events
kubectl describe pvc <name>
# Check Ceph health
ssh root@172.16.1.2 "ceph status"
# Check Ceph status
ssh root@172.16.1.2 "ceph osd pool stats"
# Check for slow requests
ssh root@172.16.1.2 "ceph health detail"