Profiler¶
Understanding latency-related issues with the engine is difficult since attaching the debugger alters the real-time performance characteristics.
Similarly, frequent log output itself can induce engine latency that prevents pinpointing of problems.
For this reason, the engine includes a low-overhead profiler to observe the engine retrospectively without affecting its operation.
This functionality is paired with a utility in the vortex-tools
package for parsing and interpreting the profiler output.
Usage¶
The profiler is activated by setting the environment variable VORTEX_PROFILER_LOG
with a valid path, such as log.prof
.
When this variable is present, each session of the vortex engine (i.e., each call to Engine.start()
) will write profiler output to the given path, creating intermediate directories as needed.
No code changes are required to activate this functionality.
THe profiler is disabled when the VORTEX_PROFILER_LOG
environment variable is absent or empty.
Caution
The profiler will overwrite logs from prior sessions. Take care when running multiple sessions.
Caution
Profiler output is written to disk in a background thread. Ensure that the session shuts down completely so that all profiling events are recorded.
Tip
To confirm activation of the profiler, check the engine’s debug-level log output for messages regarding the opening and closing of the profiler log.
Interpretation¶
The vortex-tools
package provides the profiler
module for inspecting and interpreting profiler output.
> python -m vortex_tools.profiler -h
usage: profiler.py [-h] [--count COUNT] [--skip SKIP] [--pretty] [--timeline] [--waterfall] [--statistics] path
print vortex profiler log
positional arguments:
path path to profiler log
options:
-h, --help show this help message and exit
--count COUNT, -c COUNT
number of records to show (default: None)
--skip SKIP, -s SKIP number of records to skip (default: None)
--pretty, -p show pretty output (default: False)
--timeline, --timing show profiler timeline (timing) diagram (default: False)
--waterfall show profiler waterfall diagram (default: False)
--statistics, --stats
show profiler statistics (default: False)
$ python3 -m vortex_tools.profiler -h
usage: profiler.py [-h] [--count COUNT] [--skip SKIP] [--pretty] [--timeline] [--waterfall] [--statistics] path
print vortex profiler log
positional arguments:
path path to profiler log
options:
-h, --help show this help message and exit
--count COUNT, -c COUNT
number of records to show (default: None)
--skip SKIP, -s SKIP number of records to skip (default: None)
--pretty, -p show pretty output (default: False)
--timeline, --timing show profiler timeline (timing) diagram (default: False)
--waterfall show profiler waterfall diagram (default: False)
--statistics, --stats
show profiler statistics (default: False)
The output consists of a series of records that specific the profiling event, associated task index (e.g., for an acquisition task), the job or block number, and the timestamp in nanoseconds.
The first profiling event indicates the vortex and profiler versions in the task
and job
fields, respectively.
> python -m vortex_tools.profiler log.prof --pretty --count 10
code task job timestamp
0 PROFILER_VERSION 403 0 0
1 ENGINE_LAUNCH 0 0 681110124089600
2 ENGINE_RUN 0 0 681110147807500
3 JOB_CREATE 0 0 681110149373700
4 JOB_CLEARANCE 0 0 681110149378900
5 JOB_GENERATE_SCAN 0 0 681110149919100
6 JOB_GENERATE_STROBE 0 0 681110150233500
7 JOB_ACQUIRE_DISPATCH_BEGIN 0 0 681110150344300
8 JOB_ACQUIRE_DISPATCH_END 0 0 681110150993800
9 JOB_CREATE 0 1 681110150997300
$ python3 -m vortex_tools.profiler log.prof --pretty --count 10
code task job timestamp
0 PROFILER_VERSION 403 0 0
1 ENGINE_LAUNCH 0 0 681110124089600
2 ENGINE_RUN 0 0 681110147807500
3 JOB_CREATE 0 0 681110149373700
4 JOB_CLEARANCE 0 0 681110149378900
5 JOB_GENERATE_SCAN 0 0 681110149919100
6 JOB_GENERATE_STROBE 0 0 681110150233500
7 JOB_ACQUIRE_DISPATCH_BEGIN 0 0 681110150344300
8 JOB_ACQUIRE_DISPATCH_END 0 0 681110150993800
9 JOB_CREATE 0 1 681110150997300
When the --timeline
or --waterfall
flag is passed, the profiler
module generates a timeline derived from the profiler log.
For the --waterfall
flag, each row in the timeline corresponds to one block of samples moving through the engine’s pipeline.
For the --timeline
flag, each row in the timeline corresponds to all block of samples moving through a specific stage of the engine’s pipeline.
The time that the engine and its components spend in various activities are represented by color-coded bars within each row.
Phases of activity that have multiple tasks contain a smaller color-coded bar for each task.
When the --statistics
flag is passed, the profiler
module generates a table that lists the interval, rate, duration, and time usage for each task within the engine’s pipeline.
In contrast to the above visual displays, this table is suitable for quantitative analysis of the engine’s performance.
Note
The number of preloaded blocks can artificially inflate the acquisition duration since blocks are queued far in advance.
To compensate, the profiler
module attempts to detect the number of preloaded blocks and scales the resulting values by it.
The output table includes P<N>
, where <N>
is the detected number of preloaded blocks, when this compensation is applied.
This detection may fail for short acquisitions or acquisitions that exit prematurely due to an error.
Example waterfall timeline derived from the profiler log.¶
Diagnostics¶
The profiler timeline can be used to identify configuration and electrical issues.
Incorrect Triggering: Lines drawn through completion events of acquisitions tasks do not remain parallel after the session start. This suggests that the hardware components are triggering at different rates.
Insufficient Preload: Random jitter in acquisition dispatch times is not much larger than the acquisition wait interval. The engine is likely to underflow the acquisition buffers when the computer is highly loaded.
Insufficient Block Size: The engine spends more time dispatching tasks than waiting for tasks to complete. This suggests that the overhead of setting up a block is too high so the block size should be increased.
Excessive Processing: The line drawn through completion events of recycling tasks has a smaller slope than the line drawn through acquisition tasks. This suggests that the processing pipeline is not returning blocks to the engine sufficiently fast to maintain the acquisition indefinitely.