Home Arrow Icon Knowledge base Arrow Icon Global Arrow Icon What are the steps to downgrade firmware on a DGX A100


What are the steps to downgrade firmware on a DGX A100


Downgrading firmware on a DGX A100 involves several steps, including preparing the system, enabling the downgrade option, and performing the downgrade. Here's a detailed guide:

1. Prepare the System**

- Access the BMC: Ensure you have access to the Baseboard Management Controller (BMC) of the DGX A100. This is crucial for managing firmware updates and downgrades.
- Backup Data: Before making any changes, ensure that all critical data is backed up. Firmware downgrades can potentially cause system instability or data loss.
- Download Firmware: Obtain the desired older version of the firmware from the NVIDIA Enterprise Support Portal. Make sure it is compatible with your DGX A100 system.

2. Enable the Downgrade Option**

- Set the `ForceUpdate` Flag: You need to enable the `ForceUpdate` flag on the BMC to allow downgrading. This can be done using the `nvfwupd` command:
bash
  nvfwupd --target ip= user=admin password=admin force_update enable
  

Replace `` with the actual IP address of the BMC.
- Verify the Flag Status: Confirm that the `ForceUpdate` flag is set to `True`:
bash
  nvfwupd --target ip= user=admin password=admin force_update status
  

3. Perform the Firmware Downgrade**

- Update Firmware: Use the firmware update utility to downgrade the firmware. You can use methods like NVSM, Docker, or the `.run` file, depending on your preference. For example, using the `.run` file:
bash
  sudo ./nvfw-dgxa100_.run update_fw 
  

Replace `` with the version you downloaded and `` with the component you want to downgrade.
- NVSM Example: If using NVSM, you might need to set flags like `update_fw `:
bash
  nvsm(/system/localhost/firmware/install)-> set Flags=update_fw 
  

4. Post-Downgrade Steps**

- Disable the `ForceUpdate` Flag: Once the downgrade is complete, disable the `ForceUpdate` flag to prevent unintended updates:
bash
  nvfwupd --target ip= user=admin password=admin force_update disable
  

- Verify Flag Status: Confirm that the flag is set back to `False`:
bash
  nvfwupd --target ip= user=admin password=admin force_update status
  

- Reboot and Test: Reboot the system and test to ensure that the downgrade was successful and the system is stable.

Additional Considerations

- Power Cycling: If certain components like NVMe drive firmware, FPGA, or CEC1712 were updated during the downgrade process, you may need to perform a DC power cycle using the BMC:
bash
  sudo ipmitool -I lanplus -H ${BMC_IP} -U ${BMC_USER} -P ${BMC_PW} chassis power cycle
  

- Datacenter Operations: If you are managing a large number of DGX A100 systems in a datacenter, consider power cycling systems in batches to avoid triggering power alarms or tripping breakers[1][4].

Citations:
[1] https://github.com/NVIDIA/deepops/blob/master/docs/deepops/dgx-diagnostic-firmware.md
[2] https://docs.nvidia.com/dgx/dgxa100-fw-container-release-notes/using-utility.html
[3] https://www.manualslib.com/manual/1925509/Nvidia-Dgx-A100.html
[4] https://docs.nvidia.com/dgx/dgxh100-fw-update-guide/firmware-downgrade.html
[5] https://forums.developer.nvidia.com/t/looking-for-nvidia-dgx-a100-system-firmware-update-utility/241833
[6] https://www.netapp.com/media/19432-nva-1151-design.pdf
[7] https://kb.brightcomputing.com/knowledge-base/how-to-upgrade-dgx-a100-firmware-from-headnode/
[8] https://docs.nvidia.com/dgx/dgxa100-user-guide/updating-restoring-sw.html
[9] https://docs.nvidia.com/dgx/dgxa100-fw-container-release-notes/dgxa100-fw-update-iso.html