Performance & Debugging tool#
Grain offers two configurable modes that can be set to gain deeper insights into pipeline execution and identify potential issues.
# @test {"output": "ignore"}
!pip install grain
Visualization mode#
To get an overview of your dataset pipeline structure and clear understanding of
how the data flows, enable visualization mode. This will log a visual
representation of your pipeline, allowing you to easily identify different
transformation stages and their relationships. To enable visualization mode, set
the flag --grain_py_dataset_visualization_output_dir=""
or call
grain.config.update("py_dataset_visualization_output_dir", "")
# @test {"output": "ignore"}
import grain.python as grain
grain.config.update("py_dataset_visualization_output_dir", "")
ds = (
grain.MapDataset.range(20)
.seed(seed=42)
.shuffle()
.batch(batch_size=2)
.map(lambda x: x)
.to_iter_dataset()
)
it = iter(ds)
# Visualization graph is constructed once the dataset produces the first element
for _ in range(10):
next(it)
Grain Dataset graph:
RangeMapDataset(start=0, stop=20, step=1)
││
││
││
╲╱
"<class 'int'>[]"
││
││ WithOptionsMapDataset
││
╲╱
"<class 'int'>[]"
││
││ ShuffleMapDataset
││
╲╱
"<class 'int'>[]"
││
││ BatchMapDataset(batch_size=2, drop_remainder=False)
││
╲╱
'int64[2]'
││
││ MapMapDataset(transform=<lambda> @ <ipython-input-1-930f8fd1bf7d>:9)
││
╲╱
'int64[2]'
││
││ PrefetchDatasetIterator(read_options=ReadOptions(num_threads=16, prefetch_buffer_size=500), allow_nones=False)
││
╲╱
'int64[2]'
Debug mode#
To troubleshoot performance issues in your dataset pipeline, enable debug mode.
This will log a real-time execution summary of the pipeline at one-minute
intervals. This execution summary provides a detailed information on each
transformation stage such as processing time, number of elements processed and
other details that helps in identifying the slower stages in the pipeline.
To enable debug mode, set the flag --grain_py_debug_mode=true
or call
grain.config.update("py_debug_mode",True)
import time
# Define a dummy slow preprocessing function
def _dummy_slow_fn(x):
time.sleep(10)
return x
# @test {"output": "ignore"}
import time
grain.config.update("py_debug_mode", True)
ds = (
grain.MapDataset.range(20)
.seed(seed=42)
.shuffle()
.batch(batch_size=2)
.map(_dummy_slow_fn)
.to_iter_dataset()
.map(_dummy_slow_fn)
)
it = iter(ds)
for _ in range(10):
next(it)
Grain Dataset Execution Summary:
NOTE: Before analyzing the `MapDataset` nodes, ensure that the `total_processing_time` of the `PrefetchDatasetIterator` node indicates it is a bottleneck. The `MapDataset` nodes are executed in multiple threads and thus, should not be compared to the `total_processing_time` of `DatasetIterator` nodes.
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| id | name | inputs | percent wait time | total processing time | min processing time | max processing time | avg processing time | num produced elements |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 6 | RangeMapDataset(start=0, stop= | [] | 0.00% | 86.92us | 1.00us | 53.91us | 4.35us | 20 |
| | 20, step=1) | | | | | | | |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 5 | WithOptionsMapDataset | [6] | 0.00% | N/A | N/A | N/A | N/A | N/A |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 4 | ShuffleMapDataset | [5] | 0.00% | 15.95ms | 42.40us | 2.28ms | 797.35us | 20 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 3 | BatchMapDataset(batch_size=2, | [4] | 0.00% | 803.14us | 47.04us | 290.24us | 80.31us | 10 |
| | drop_remainder=False) | | | | | | | |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2 | MapMapDataset(transform=_dummy | [3] | 16.68% | 100.08s | 10.00s | 10.01s | 10.01s | 10 |
| | _slow_fn @ <ipython-input-2-23 | | | | | | | |
| | 02a47a813f>:4) | | | | | | | |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | PrefetchDatasetIterator(read_o | [2] | N/A | 10.02s | 12.40us | 10.02s | 1.67s | 6 |
| | ptions=ReadOptions(num_threads | | | | | | | |
| | =16, prefetch_buffer_size=500) | | | | | | | |
| | , allow_nones=False) | | | | | | | |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | MapDatasetIterator(transform=_ | [1] | 83.32% | 50.05s | 10.01s | 10.01s | 10.01s | 5 |
| | dummy_slow_fn @ <ipython-input | | | | | | | |
| | -2-2302a47a813f>:4) | | | | | | | |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
In the above execution summary, 86% of the time is spent in the
MapDatasetIterator
node and is the slowest stage of the pipeline.
Note that although from the total_processing_time
, it might appear that
MapMapDataset
(id:2) is the slowest stage, nodes from the id 2 to 6 are
executed in multiple threads and hence, the total_processing_time
of these
nodes should be compared to the total_processing_time
of iterator nodes(id:0)