To set up a smart Power Distribution Unit (PDU) for power monitoring on the NVIDIA DGX Station A100, follow these detailed steps:
1. Mounting the Smart PDU**
- Mounting Options: The Smart PDU can be mounted either vertically or horizontally in a rack. For horizontal mounting, select appropriate bracket mounting points to ensure proper depth within the rack. Attach L-brackets with screws and install the enclosure into the rack slots, allowing for some horizontal adaptability[3].- Vertical Mounting: Use the button mounting kit provided with the Smart PDU. Distribute buttons vertically and attach them to the PDU as needed. Ensure there is enough clearance at the top for the buttons to mount into keyholes[3].
2. Connecting to the Power Source**
- Connect the Smart PDU to a suitable power source. Ensure that the power source matches the specifications of the DGX Station A100 and the Smart PDU itself. Typically, the DGX Station A100 requires a power source that can supply 100V to 240V AC, with specific current ratings[1][7].3. Connecting Devices**
- Connect the DGX Station A100 to the Smart PDU using the appropriate power cables. The DGX A100 systems use locking power cords to ensure safety and compliance[1].- Ensure that all devices are properly connected and powered off before proceeding.
4. Connecting Sensors**
- If your Smart PDU supports additional sensors (e.g., temperature or humidity sensors), connect them according to the manufacturer's instructions. These sensors can provide valuable environmental data for monitoring and maintaining optimal operating conditions[3].5. Configuring the Smart PDU**
- Login: Access the Smart PDU's web interface using the default administrator credentials (e.g., admin/admin). Change these credentials for security[3].- Network Settings: Configure the network settings to enable remote monitoring. Set the IP address, subnet mask, gateway, and DNS servers as required for your network environment[6].
- User Accounts: Create new administrative user accounts and remove the default admin account for enhanced security[3].
- Sensor Names: If applicable, configure names for any connected sensors to easily identify data sources[3].
6. Connecting to the Network**
- Connect the Smart PDU to your network. This allows remote access for monitoring and configuration. Ensure that the network connection is stable and secure[3].7. Monitoring Power Consumption**
- Use the Smart PDU's web interface or IPMI tools to monitor power consumption. The PDU can provide real-time and historical data on power usage, which is crucial for managing energy efficiency and planning capacity[4].- For detailed energy consumption metrics (e.g., total energy since power-on), you might need to use external tools like Prometheus or InfluxDB to collect and analyze data over time[4].
8. Integrating with DGX Station A100**
- Ensure that the DGX Station A100 is properly connected to the Smart PDU and that all power cables are securely locked into place to prevent accidental disconnections[1].- Use tools like NVIDIA DCGM or `nvidia-smi` to monitor GPU-specific power consumption if needed[4].
By following these steps, you can effectively set up a smart PDU for power monitoring on your DGX Station A100, ensuring efficient energy management and system reliability.
Citations:
[1] http://cdn.cnetcontent.com/2f/68/2f6888a0-063f-4d76-94e4-8666b7619dfd.pdf
[2] https://www.robusthpc.com/wp-content/uploads/2021/11/nvidia-dgx-station-a100-system-architecture-white-paper_published.pdf
[3] https://cdn10.servertech.com/assets/documents/documents/137/original/301-0125-4_Smart_PDU_RevV.pdf?1493072227
[4] https://www.reddit.com/r/HPC/comments/1125pw7/nvidia_dgxa100_energy_monitoring/
[5] https://docs.nvidia.com/dgx/dgx-station-a100-user-guide/index.html
[6] https://www.ibm.com/docs/8335-GTG/p9hbf/p9hbf_configutility_isp.htm
[7] https://docs.nvidia.com/dgx/dgx-station-a100-user-guide/getting-started-station-a100.html
[8] https://www.youtube.com/watch?v=I6Nx74LqTCA