Updating the firmware on your NVIDIA DGX A100 involves several methods, including using a firmware update container, a PXE boot process, or a USB flash drive with an ISO image. Here's a detailed guide on how to update the firmware using these methods:
Method 1: Using the Firmware Update Container
1. Download the Firmware Update Container:
- Access the NVIDIA Enterprise Support Portal and download the DGX A100 firmware update container. The container is usually provided as a tarball or a `.run` file.
2. Copy the Container to the DGX System:
- Transfer the downloaded container to your DGX A100 system. You can use `scp` for this purpose:
bash
scp nvfw-dgxa100_24.11.1_241107.tar.gz user@dgx-system:/path/to/container
3. Load the Container Image:
- Use Docker to load the container image:
bash
sudo docker load -i nvfw-dgxa100_24.11.1_241107.tar.gz
4. Verify the Container:
- Check if the container image is loaded correctly:
bash
sudo docker images
5. Update Firmware Using NVSM:
- Use the NVIDIA System Management (NVSM) tool to update the firmware interactively:
bash
sudo nvsm
nvsm-> cd systems/localhost/firmware/install
nvsm(/system/localhost/firmware/install)-> set DockerImageRef=nvfw-dgxa100:24.11.1
nvsm(/system/localhost/firmware/install)-> set Flags=update_fw\ all
Alternatively, you can use Docker directly:
bash
sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgxa100:24.11.1 update_fw all
Or use the `.run` file:
bash
sudo ./nvfw-dgxa100_24.11.1_241107.run update_fw all
Method 2: Using PXE Boot
1. Download the Firmware Update PXE Netboot File:
- From the NVIDIA Enterprise Support Portal, download the DGX A100 firmware update PXE netboot file.
2. Copy the File to the Headnode:
- Transfer the downloaded file to your headnode using `scp`:
bash
scp pxeboot-DGXA100_FWUI-24.6.1.tgz user@headnode:/tmp
3. Extract the Firmware Files:
- On the headnode, extract the contents of the tar file:
bash
cd /tmp
tar xzvf pxeboot-DGXA100_FWUI-24.6.1.tgz
4. Create a TFTP Directory:
- Create a directory for the firmware update files:
bash
mkdir /tftpboot/a100fw_24.6.1
5. Move Files to TFTP Directory:
- Move the necessary files to the TFTP directory and create symlinks as needed.
6. Configure PXE Boot:
- Configure each DGX A100 node to use the new PXE entry. For example:
bash
cmsh % device use dgx-01 set pxelabel a100fw_24.6.1 commit
7. Reboot and Update Firmware:
- Reboot the node. It should automatically boot into the firmware update environment.
- Use the BMC web interface or SSH to complete the update process:
bash
update_fw all
Method 3: Using a USB Flash Drive with an ISO Image
1. Download the Firmware Update ISO:
- Access the NVIDIA Enterprise Support Portal and download the DGX A100 firmware update ISO.
2. Create a Bootable USB Drive:
- Create a bootable USB drive with the ISO image.
3. Boot from the USB Drive:
- Boot the DGX A100 from the USB drive. Ensure that virtual media from the BMC is not used, as the BMC will be reset during the update.
4. Update Firmware:
- Follow the on-screen instructions to update the firmware. If not set to update automatically, manually execute the update commands.
After updating the firmware, ensure that you clear any PXE labels if you used PXE boot to prevent the system from booting back into the firmware update environment upon reboot. Additionally, if certain components like NVMe drive firmware or FPGA were updated, you may need to perform a DC power cycle using IPMI commands.
Citations:
[1] https://kb.brightcomputing.com/knowledge-base/how-to-upgrade-dgx-a100-firmware-from-headnode/
[2] https://docs.nvidia.com/dgx/dgxa100-fw-container-release-notes/
[3] https://docs.nvidia.com/dgx/dgxa100-fw-container-release-notes/using-utility.html
[4] https://docs.nvidia.com/dgx/dgxa100-user-guide/updating-restoring-sw.html
[5] https://kb.brightcomputing.com/article-categories/nvidia-dgx/
[6] https://lenovopress.lenovo.com/lp1928.pdf
[7] https://github.com/NVIDIA/deepops/blob/master/docs/deepops/dgx-diagnostic-firmware.md
[8] https://support.brightcomputing.com/manuals/10/admin-manual.pdf
[9] https://docs.nvidia.com/dgx/dgxa100-fw-container-release-notes/dgxa100-fw-update-iso.html