Yes, you can manually update the firmware on the NVIDIA DGX A100 without using the ISO file. Here are the detailed steps to achieve this:
Using the DGX A100 Firmware Update Utility
The DGX A100 System Firmware Update utility is available as a tarball and a `.run` file. You can use either method to update the firmware.
1. Download the Firmware Update Files:
- Access the NVIDIA Enterprise Support Portal to download the DGX A100 firmware update files. You will need the `.tar.gz` file (e.g., `nvfw-dgxa100_24.11.1_241107.tar.gz`) and/or the `.run` file (e.g., `nvfw-dgxa100_24.11.1_241107.run`).
2. Copy Files to the DGX A100 System:
- Transfer the downloaded files to the DGX A100 system. Ensure you have sufficient permissions to perform the update.
3. Update Using Docker:
- The DGX A100 system includes Docker, which is required to run the firmware update container.
- Load the Docker image from the tarball:
bash
sudo docker load -i nvfw-dgxa100_24.11.1_241107.tar.gz
- Verify the image is loaded:
bash
sudo docker images
- Run the Docker container to update the firmware:
bash
sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgxa100:24.11.1 update_fw all
- This command updates all firmware components. You can specify specific components by replacing `all` with the names of the components you want to update (e.g., `BMC SBIOS`).
4. Update Using the `.run` File:
- If you prefer not to use Docker, you can run the `.run` file directly:
bash
sudo ./nvfw-dgxa100_24.11.1_241107.run update_fw all
- This method also updates all firmware components. You can specify components similarly to the Docker method.
5. Using NVSM (NVIDIA System Management):
- NVSM provides an interactive mode for updating firmware.
- Enter the NVSM firmware update module:
bash
sudo nvsm
nvsm-> cd systems/localhost/firmware/install
- Set the flags for the update action:
bash
nvsm(/system/localhost/firmware/install)-> set Flags=update_fw\ all
- Set the Docker image reference:
bash
nvsm(/system/localhost/firmware/install)-> set DockerImageRef=nvfw-dgxa100:24.11.1
- Note that in NVSM, you need to escape spaces with a backslash (`\`).
Additional Considerations
- Fan Speed Increase: During the BMC firmware update, fan speeds may increase temporarily, which is normal[1].
- Reboot Required: After updating the firmware, you may need to reboot the system for the changes to take effect.
- PXE Boot Alternative: If you manage a cluster of DGX A100 systems, you can also use PXE boot to update firmware from a headnode, but this requires additional setup and infrastructure[4].
By following these steps, you can manually update the firmware on your DGX A100 system without using the ISO file.
Citations:
[1] https://docs.nvidia.com/dgx/dgxa100-fw-container-release-notes/using-utility.html
[2] https://docs.nvidia.com/dgx/dgxa100-fw-container-release-notes/dgxa100-fw-update-iso.html
[3] https://www.manualslib.com/manual/1925509/Nvidia-Dgx-A100.html
[4] https://kb.brightcomputing.com/knowledge-base/how-to-upgrade-dgx-a100-firmware-from-headnode/
[5] https://www.broadberry.com/dual-amd-epyc-rackmount-servers/nvidia-dgx-a100
[6] https://support.brightcomputing.com/manuals/9.2/admin-manual.pdf
[7] https://docs.nvidia.com/dgx/dgxa100-user-guide/updating-restoring-sw.html
[8] https://kb.brightcomputing.com/knowledge-base/installing-nvidia-dgx-software-stack-in-bright-ubuntu-20-04-software-images/
[9] https://github.com/NVIDIA/deepops/blob/master/docs/deepops/dgx-diagnostic-firmware.md
[10] http://cdn.cnetcontent.com/2f/68/2f6888a0-063f-4d76-94e4-8666b7619dfd.pdf