hyrax.gpu_monitor

Classes

GpuMonitor

General GPU monitor that runs in a separate thread and logs GPU metrics

GPU

Holds the GPU metrics retrieved from nvidia-smi.

Functions

safe_float_cast(str_number)

Convert a string into a float handling the case of nan.

get_gpu_info()

Get the GPU utilization and memory usage for all GPUs on the system using

Module Contents

class GpuMonitor(tensorboard_logger, interval_seconds=1)[source]

Bases: threading.Thread

General GPU monitor that runs in a separate thread and logs GPU metrics to Tensorboard.

This constructor should always be called with keyword arguments. Arguments are:

group should be None; reserved for future extension when a ThreadGroup class is implemented.

target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.

name is the thread name. By default, a unique name is constructed of the form “Thread-N” where N is a small decimal number.

args is a list or tuple of arguments for the target invocation. Defaults to ().

kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.

If a subclass overrides the constructor, it must make sure to invoke the base class constructor (Thread.__init__()) before doing anything else to the thread.

stopped = False[source]
delay = 1[source]
start_time[source]
tensorboard_logger[source]
run()[source]

Run loop that logs GPU metrics every self.delay seconds.

stop()[source]

Stop the monitoring thread.

class GPU[source]

Holds the GPU metrics retrieved from nvidia-smi.

id: int[source]
load: float[source]
memory_total: float[source]
memory_used: float[source]
property memory_util[source]

Return the memory utilization of the GPU.

safe_float_cast(str_number)[source]

Convert a string into a float handling the case of nan.

Parameters:

str_number (str) – The string to convert to a float.

Returns:

The converted float.

Return type:

float

get_gpu_info()[source]

Get the GPU utilization and memory usage for all GPUs on the system using nvidia-smi. Returns a list of GPU objects.