hyrax.gpu_monitor
=================

.. py:module:: hyrax.gpu_monitor


Classes
-------

.. autoapisummary::

   hyrax.gpu_monitor.GpuMonitor
   hyrax.gpu_monitor.GPU


Functions
---------

.. autoapisummary::

   hyrax.gpu_monitor.safe_float_cast
   hyrax.gpu_monitor.get_gpu_info


Module Contents
---------------

.. py:class:: GpuMonitor(tensorboard_logger, interval_seconds=1)

   Bases: :py:obj:`threading.Thread`


   General GPU monitor that runs in a separate thread and logs GPU metrics
   to Tensorboard.

   This constructor should always be called with keyword arguments. Arguments are:

   *group* should be None; reserved for future extension when a ThreadGroup
   class is implemented.

   *target* is the callable object to be invoked by the run()
   method. Defaults to None, meaning nothing is called.

   *name* is the thread name. By default, a unique name is constructed of
   the form "Thread-N" where N is a small decimal number.

   *args* is a list or tuple of arguments for the target invocation. Defaults to ().

   *kwargs* is a dictionary of keyword arguments for the target
   invocation. Defaults to {}.

   If a subclass overrides the constructor, it must make sure to invoke
   the base class constructor (Thread.__init__()) before doing anything
   else to the thread.



   .. py:attribute:: stopped
      :value: False



   .. py:attribute:: delay
      :value: 1



   .. py:attribute:: start_time


   .. py:attribute:: tensorboard_logger


   .. py:method:: run()

      Run loop that logs GPU metrics every `self.delay` seconds.



   .. py:method:: stop()

      Stop the monitoring thread.



.. py:class:: GPU

   Holds the GPU metrics retrieved from nvidia-smi.


   .. py:attribute:: id
      :type:  int


   .. py:attribute:: load
      :type:  float


   .. py:attribute:: memory_total
      :type:  float


   .. py:attribute:: memory_used
      :type:  float


   .. py:property:: memory_util

      Return the memory utilization of the GPU.


.. py:function:: safe_float_cast(str_number)

   Convert a string into a float handling the case of `nan`.

   :param str_number: The string to convert to a float.
   :type str_number: str

   :returns: The converted float.
   :rtype: float


.. py:function:: get_gpu_info()

   Get the GPU utilization and memory usage for all GPUs on the system using
   nvidia-smi. Returns a list of GPU objects.


