Produces a libtimemory-ncclp.so that uses GOTCHA wrappers around ~12 NCCL functions.
Four functions are provided for C, C++, and Fortran:
uint64_t timemory_start_ncclp()- Returns the number of initializations
uint64_t timemory_stop_ncclp(uint64_t idx)- Removes the initialization request at
idx - Returns the number of remaining initializations
- Removes the initialization request at
void timemory_register_ncclp()- Ensures a global initialization exists until it deregistration
void timemory_deregister_ncclp()- Deactivates the global initialization
The environement variable ENABLE_TIMEMORY_NCCLP (default: "ON") controls configuration of the instrumentation.
This library configures the tim::user_ncclp_bundle component with the components specified by the following environment variables in terms of priority:
TIMEMORY_NCCLP_COMPONENTSTIMEMORY_MPIP_COMPONENTSTIMEMORY_PROFILER_COMPONENTSTIMEMORY_GLOBAL_COMPONENTSTIMEMORY_COMPONENT_LIST_INIT
When one of the above environment variables are set to "none", then the priority search for component configurations is abandoned.
The following will result in NCCL function instrumented with cpu_clock:
export TIMEMORY_NCCLP_COMPONENTS="cpu_clock"
export TIMEMORY_PROFILER_COMPONENTS="peak_rss"
export TIMEMORY_GLOBAL_COMPONENTS="wall_clock"The following will result in NCCL functions containing no instrumentation:
export TIMEMORY_NCCLP_COMPONENTS="none"
export TIMEMORY_PROFILER_COMPONENTS="peak_rss"
export TIMEMORY_GLOBAL_COMPONENTS="wall_clock"The following will result in NCCL function instrumented with wall_clock and page_rss:
export TIMEMORY_NCCLP_COMPONENTS=""
export TIMEMORY_GLOBAL_COMPONENTS="wall_clock,page_rss"