hyrax.gpu_monitor#

Classes#

GpuMonitor

General GPU monitor that runs in a separate thread and logs GPU metrics

GPU

Holds the GPU metrics retrieved from nvidia-smi.

Functions#

safe_float_cast(str_number)

Convert a string into a float handling the case of nan.

get_gpu_info()

Get the GPU utilization and memory usage for all GPUs on the system using

Module Contents#

class GpuMonitor(interval_seconds=1)[source]#

Bases: threading.Thread

General GPU monitor that runs in a separate thread and logs GPU metrics to Tensorboard.

This constructor should always be called with keyword arguments. Arguments are:

group should be None; reserved for future extension when a ThreadGroup class is implemented.

target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.

name is the thread name. By default, a unique name is constructed of the form “Thread-N” where N is a small decimal number.

args is a list or tuple of arguments for the target invocation. Defaults to ().

kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.

If a subclass overrides the constructor, it must make sure to invoke the base class constructor (Thread.__init__()) before doing anything else to the thread.

stopped = False[source]#
delay = 1[source]#
start_time[source]#
tensorboard_logger[source]#
run()[source]#

Run loop that logs GPU metrics every self.delay seconds.

stop()[source]#

Stop the monitoring thread.

class GPU[source]#

Holds the GPU metrics retrieved from nvidia-smi.

id: int[source]#
load: float[source]#
memory_total: float[source]#
memory_used: float[source]#
property memory_util[source]#

Return the memory utilization of the GPU.

safe_float_cast(str_number)[source]#

Convert a string into a float handling the case of nan.

Parameters:

str_number (str) – The string to convert to a float.

Returns:

The converted float.

Return type:

float

get_gpu_info()[source]#

Get the GPU utilization and memory usage for all GPUs on the system using nvidia-smi. Returns a list of GPU objects.