This commit is contained in:
bobtiji 2022-01-22 08:17:32 -05:00
parent cf288ca19d
commit 8f75f76490

View File

@ -9,7 +9,7 @@
&& sudo apt -y install nvidia-container-toolkit nvidia-container-runtime nvidia-docker2
DCGM on host machine running nvidia GPU
DCGM on host machine running Nvidia GPU
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin \
&& sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 \
&& sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub \
@ -21,15 +21,11 @@
## Deployment
1. Modify the prometheus configuration template `/etc/prometheus/prometheus.yml` location.
2. # job for nvidia DCGM exporter
# job for nvidia DCGM exporter
- job_name: 'nvidia_exporter'
static_configs:
- targets: ['nvidia_exporter:9400'] # if nvidia_exporter container is not on same docker network , change this line to "- targets: ['whichever ip your host is:9400']"
## Configuration
None
# Additional Referfences
[Official DCGM Documentations](https://github.com/NVIDIA/DCGM)
[Nvidia container toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide)