Nsight Systems is a powerful tool designed to visualize and analyze CPU-GPU interactions in applications, providing insights into performance bottlenecks and optimization opportunities. Here's how it visualizes these interactions:
System-Wide Performance Analysis
Nsight Systems captures system-wide activity data, including both CPU and GPU events, and displays them on a unified timeline. This timeline allows developers to see how different components of the system interact with each other over time, making it easier to identify correlations, dependencies, and bottlenecks in the application's workflow[3][4].
CPU Activity Visualization
Nsight Systems visualizes CPU activity by showing thread states, utilization, and algorithm execution. This helps developers understand how CPU resources are being used and where potential bottlenecks might exist. The tool supports multi-process tree analysis, allowing users to track the activity of multiple processes and threads simultaneously[1][3].
GPU Activity Visualization
For GPU activity, Nsight Systems provides detailed insights into GPU workloads, including streaming-multiprocessor (SM) optimization, memory transfers, and kernel execution. It supports tracing various GPU APIs such as CUDA, Vulkan, and OpenGL, enabling developers to analyze GPU compute and graphics tasks in depth[3][4]. The tool also offers GPU metrics sampling, which includes metrics like SM utilization, Tensor Core activity, and instruction throughput. These metrics help developers optimize GPU performance by identifying underutilization or inefficiencies in GPU resource usage[1][3].
Correlating CPU and GPU Events
One of the key features of Nsight Systems is its ability to correlate CPU and GPU events. By visualizing both CPU and GPU activities on the same timeline, developers can see how CPU operations influence GPU performance and vice versa. This correlation is crucial for identifying bottlenecks that occur due to interactions between the CPU and GPU, such as data transfer delays or synchronization issues[3][4].
Low Overhead and Scalability
Nsight Systems is designed to operate with low overhead, ensuring that the profiling process does not significantly impact the application's performance. This makes it suitable for analyzing complex, large-scale applications running on various NVIDIA platforms, from workstations to data centers and cloud environments[3][6].
User Interface and Navigation
The tool provides a user-friendly GUI that allows developers to navigate through the timeline view using scroll bars, mouse wheel, and keyboard shortcuts. This interface enables users to zoom in on specific time ranges or events, facilitating detailed analysis of performance issues[5][9].
Overall, Nsight Systems offers a comprehensive view of CPU-GPU interactions, enabling developers to optimize their applications for better performance and efficiency across a wide range of platforms.
Citations:
[1] https://extremecomputingtraining.anl.gov/wp-content/uploads/sites/96/2022/11/ATPESC-2022-Track-6-Talk-2-Keipert-NVIDIA.pdf
[2] https://bede-documentation.readthedocs.io/en/latest/software/tools/nsight-systems.html
[3] https://developer.nvidia.com/nsight-systems
[4] https://indico.cern.ch/event/962112/contributions/4047370/attachments/2159916/3643963/Nsight%20Systems%20-%20x86%20Introduction%20-%20CERN.pdf
[5] https://www.youtube.com/watch?v=TGChXcFm-Yo
[6] https://docs.nersc.gov/tools/performance/nvidiaproftools/
[7] https://www.youtube.com/watch?v=kKANP0kL_hk
[8] https://docs.nvidia.com/nsight-systems/UserGuide/index.html
[9] https://www.youtube.com/watch?v=dUDGO66IadU
[10] https://dev-discuss.pytorch.org/t/using-nsight-systems-to-profile-gpu-workload/59