imgaug.multicore

Classes and functions dealing with augmentation on multiple CPU cores.

class imgaug.multicore.BackgroundAugmenter(batch_loader, augseq, queue_size=50, nb_workers='auto')[source]

Bases: object

Deprecated. Augment batches in the background processes.

Deprecated. Use imgaug.multicore.Pool instead.

This is a wrapper around the multiprocessing module.

Parameters:
  • batch_loader (BatchLoader or multiprocessing.Queue) – BatchLoader object that loads the data fed into the BackgroundAugmenter, or alternatively a Queue. If a Queue, then it must be made sure that a final None in the Queue signals that the loading is finished and no more batches will follow. Otherwise the BackgroundAugmenter will wait forever for the next batch.
  • augseq (Augmenter) – An augmenter to apply to all loaded images. This may be e.g. a Sequential to apply multiple augmenters.
  • queue_size (int) – Size of the queue that is used to temporarily save the augmentation results. Larger values offer the background processes more room to save results when the main process doesn’t load much, i.e. they can lead to smoother and faster training. For large images, high values can block a lot of RAM though.
  • nb_workers (‘auto’ or int) – Number of background workers to spawn. If auto, it will be set to C-1, where C is the number of CPU cores.

Methods

get_batch(self) Returns a batch from the queue of augmented batches.
terminate(self) Terminates all background processes immediately.
all_finished  
all_finished(self)[source]
get_batch(self)[source]

Returns a batch from the queue of augmented batches.

If workers are still running and there are no batches in the queue, it will automatically wait for the next batch.

Returns:out – One batch or None if all workers have finished.
Return type:None or imgaug.Batch
terminate(self)[source]

Terminates all background processes immediately.

This will also free their RAM.

class imgaug.multicore.BatchLoader(load_batch_func, queue_size=50, nb_workers=1, threaded=True)[source]

Bases: object

Deprecated. Load batches in the background.

Deprecated. Use imgaug.multicore.Pool instead.

Loaded batches can be accesses using imgaug.BatchLoader.queue.

Parameters:
  • load_batch_func (callable or generator) – Generator or generator function (i.e. function that yields Batch objects) or a function that returns a list of Batch objects. Background loading automatically stops when the last batch was yielded or the last batch in the list was reached.
  • queue_size (int, optional) – Maximum number of batches to store in the queue. May be set higher for small images and/or small batches.
  • nb_workers (int, optional) – Number of workers to run in the background.
  • threaded (bool, optional) – Whether to run the background processes using threads (True) or full processes (False).

Methods

all_finished(self) Determine whether the workers have finished the loading process.
terminate(self) Stop all workers.
count_workers_alive  
all_finished(self)[source]

Determine whether the workers have finished the loading process.

Returns:out – True if all workers have finished. Else False.
Return type:bool
count_workers_alive(self)[source]
terminate(self)[source]

Stop all workers.

class imgaug.multicore.Pool(augseq, processes=None, maxtasksperchild=None, seed=None)[source]

Bases: object

Wrapper around multiprocessing.Pool for multicore augmentation.

Parameters:
  • augseq (imgaug.augmenters.meta.Augmenter) – The augmentation sequence to apply to batches.
  • processes (None or int, optional) – The number of background workers, similar to the same parameter in multiprocessing.Pool. If None, the number of the machine’s CPU cores will be used (this counts hyperthreads as CPU cores). If this is set to a negative value p, then P - abs(p) will be used, where P is the number of CPU cores. E.g. -1 would use all cores except one (this is useful to e.g. reserve one core to feed batches to the GPU).
  • maxtasksperchild (None or int, optional) – The number of tasks done per worker process before the process is killed and restarted, similar to the same parameter in multiprocessing.Pool. If None, worker processes will not be automatically restarted.
  • seed (None or int, optional) – The seed to use for child processes. If None, a random seed will be used.
Attributes:
pool

Return or create the multiprocessing.Pool instance.

Methods

close(self) Close the pool gracefully.
imap_batches(self, batches[, chunksize, …]) Augment batches from a generator.
imap_batches_unordered(self, batches[, …]) Augment batches from a generator (without preservation of order).
join(self) Wait for the workers to exit.
map_batches(self, batches[, chunksize]) Augment a list of batches.
map_batches_async(self, batches[, …]) Augment batches asynchonously.
terminate(self) Terminate the pool immediately.
close(self)[source]

Close the pool gracefully.

imap_batches(self, batches, chunksize=1, output_buffer_size=None)[source]

Augment batches from a generator.

Pattern for output buffer constraint is from https://stackoverflow.com/a/47058399.

Parameters:
  • batches (generator of imgaug.augmentables.batches.Batch) – The batches to augment, provided as a generator. Each call to the generator should yield exactly one batch.

  • chunksize (None or int, optional) – Rough indicator of how many tasks should be sent to each worker. Increasing this number can improve performance.

  • output_buffer_size (None or int, optional) – Max number of batches to handle at the same time in the whole pipeline (including already augmented batches that are waiting to be requested). If the buffer size is reached, no new batches will be loaded from batches until a produced (i.e. augmented) batch is consumed (i.e. requested from this method). The buffer is unlimited if this is set to None. For large datasets, this should be set to an integer value to avoid filling the whole RAM if loading+augmentation happens faster than training.

    New in version 0.3.0.

Yields:

imgaug.augmentables.batches.Batch – Augmented batch.

imap_batches_unordered(self, batches, chunksize=1, output_buffer_size=None)[source]

Augment batches from a generator (without preservation of order).

Pattern for output buffer constraint is from https://stackoverflow.com/a/47058399.

Parameters:
  • batches (generator of imgaug.augmentables.batches.Batch) – The batches to augment, provided as a generator. Each call to the generator should yield exactly one batch.

  • chunksize (None or int, optional) – Rough indicator of how many tasks should be sent to each worker. Increasing this number can improve performance.

  • output_buffer_size (None or int, optional) – Max number of batches to handle at the same time in the whole pipeline (including already augmented batches that are waiting to be requested). If the buffer size is reached, no new batches will be loaded from batches until a produced (i.e. augmented) batch is consumed (i.e. requested from this method). The buffer is unlimited if this is set to None. For large datasets, this should be set to an integer value to avoid filling the whole RAM if loading+augmentation happens faster than training.

    New in version 0.3.0.

Yields:

imgaug.augmentables.batches.Batch – Augmented batch.

join(self)[source]

Wait for the workers to exit.

This may only be called after first calling close() or terminate().

map_batches(self, batches, chunksize=None)[source]

Augment a list of batches.

Parameters:
  • batches (list of imgaug.augmentables.batches.Batch) – The batches to augment.
  • chunksize (None or int, optional) – Rough indicator of how many tasks should be sent to each worker. Increasing this number can improve performance.
Returns:

Augmented batches.

Return type:

list of imgaug.augmentables.batches.Batch

map_batches_async(self, batches, chunksize=None, callback=None, error_callback=None)[source]

Augment batches asynchonously.

Parameters:
  • batches (list of imgaug.augmentables.batches.Batch) – The batches to augment.
  • chunksize (None or int, optional) – Rough indicator of how many tasks should be sent to each worker. Increasing this number can improve performance.
  • callback (None or callable, optional) – Function to call upon finish. See multiprocessing.Pool.
  • error_callback (None or callable, optional) – Function to call upon errors. See multiprocessing.Pool.
Returns:

Asynchonous result. See multiprocessing.Pool.

Return type:

multiprocessing.MapResult

pool

Return or create the multiprocessing.Pool instance.

This creates a new instance upon the first call and afterwards returns that instance (until the property _pool is set to None again).

Returns:The multiprocessing.Pool used internally by this imgaug.multicore.Pool.
Return type:multiprocessing.Pool
terminate(self)[source]

Terminate the pool immediately.