# How to use ros2_tracing to trace and analyze an application

This guide shows how to use [`ros2_tracing`](https://github.com/ros2/ros2_tracing) to trace and analyze a ROS 2 application.
For this guide, the application will be [`performance_test`](https://gitlab.com/ApexAI/performance_test).

## Overview

This guide covers:

1. installing tracing-related tools and building ROS 2 with the core instrumentation enabled
1. running and tracing a `performance_test` run
1. analyzing the trace data using [`tracetools_analysis`](https://gitlab.com/ros-tracing/tracetools_analysis) to plot the callback durations

## Prerequisites

This guide is aimed at real-time systems.
See the [real-time system setup guide](Real-Time-Operating-System-Setup/Real-Time-Linux/rt_linux_index.md).
However, the guide will work if you are using a non-real-time system.

```eval_rst
.. note::
   This guide was written for ROS 2 Rolling on Ubuntu 20.04. It should work on other ROS 2 distros or Ubuntu versions, but some things might need to be adjusted.
```

## Installing and building

First, make sure you have [installed all dependencies for ROS 2 Rolling](https://docs.ros.org/en/rolling/Installation/Ubuntu-Development-Setup.html).

Install [LTTng](https://lttng.org/docs/) as well as `babeltrace`.
We will only install the LTTng userspace tracer.

```sh
$ sudo apt-get update
$ sudo apt-get install -y lttng-tools liblttng-ust-dev python3-lttng python3-babeltrace babeltrace
```

Then create a workspace, import the ROS 2 Rolling code, and clone `performance_test` and `tracetools_analysis`.

```sh
$ cd ~/
$ mkdir -p tracing_ws/src
$ cd tracing_ws/
$ vcs import src/ --input https://raw.githubusercontent.com/ros2/ros2/master/ros2.repos
$ cd src/
$ git clone https://gitlab.com/ApexAI/performance_test.git
$ git clone https://gitlab.com/ros-tracing/tracetools_analysis.git
$ cd ..
```

Install dependencies with rosdep.
```sh
$ rosdep update
$ rosdep install --from-paths src --ignore-src -y --skip-keys "fastcdr rti-connext-dds-6.0.1 urdfdom_headers"
```

Then build up to `performance_test` and configure it for ROS 2.
See its [documentation](https://gitlab.com/ApexAI/performance_test#ros-2-middleware-plugins).
We also need to build `ros2trace` to set up tracing using the `ros2 trace` command and `tracetools_analysis` to analyze the data.

```sh
$ colcon build --packages-up-to ros2trace tracetools_analysis performance_test --cmake-args -DPERFORMANCE_TEST_RCLCPP_ENABLED=ON
```

You should see the following message once `tracetools` is done building:

```sh
LTTng found: tracing enabled
```

This confirms that LTTng was properly detected and that the instrumentation built into the ROS 2 core is enabled.

Next, we will run a `performance_test` experiment and trace it.

## Tracing

Start an LTTng session daemon.
For userspace tracing, the daemon does not need to be started as `root`.
Note that a non-root daemon will be spawned automatically by `ros2 trace` if it is not already running.

```sh
$ lttng-sessiond --daemonize
```

In one terminal, source the workspace and setup tracing.
We need to explicitly use the `--kernel` option with no values to disable kernel tracing, since we did not install the kernel tracer.
When running the command, a list of ROS 2 userspace events will be printed.
It will also print the path to the directory that will contain the resulting trace (under `~/.ros/tracing`).
Press enter to start tracing.

```sh
$ # terminal 1
$ cd ~/tracing_ws
$ source install/setup.bash
$ ros2 trace --session-name perf-test --kernel --list
```

In a second terminal, source the workspace.

```sh
$ # terminal 2
$ cd ~/tracing_ws
$ source install/setup.bash
```

Then run the `performance_test` experiment.
We simply create an experiment with a node publishing ~1 MB messages to another node as fast as possible for 60 seconds using the second highest real-time priority so that we don't interfere with critical kernel threads.
We need to run `performance_test` as root to be able to use real-time priorities.

```sh
$ # terminal 2
$ sudo ./install/performance_test/lib/performance_test/perf_test -c rclcpp-single-threaded-executor -p 1 -s 1 -r 0 -m Array1m --reliability RELIABLE --max-runtime 60 --use-rt-prio 98
```

If that last command doesn't work for you (with an error like: "error while loading shared libraries"), run the slightly-different command below.
This is because, for security reasons, we need to manually pass `*PATH` environment variables for some shared libraries to be found (see [this explanation](https://unix.stackexchange.com/a/251374)).

```sh
$ # terminal 2
$ sudo env PATH="$PATH" LD_LIBRARY_PATH="$LD_LIBRARY_PATH" ./install/performance_test/lib/performance_test/perf_test -c rclcpp-single-threaded-executor -p 1 -s 1 -r 0 -m Array1m --reliability RELIABLE --max-runtime 60 --use-rt-prio 98
```

```eval_rst
.. note::
   If you're not using a real-time kernel, simply run:

   .. code-block:: bash

      $ # terminal 2
      $ ./install/performance_test/lib/performance_test/perf_test -c rclcpp-single-threaded-executor -p 1 -s 1 -r 0 -m Array1m --reliability RELIABLE --max-runtime 60
```

Once the experiment is done, in the first terminal, press enter again to stop tracing.
Use `babeltrace` to quickly look at the resulting trace.

```sh
$ babeltrace ~/.ros/tracing/perf-test
```

The output of the above command is a human-readable version of the raw Common Trace Format (CTF) data, which is a list of trace events.
Each event has a timestamp, an event type, some information on the process that generated the event, and the values of the fields of the given event type.

Next, we will analyze the trace.

## Analysis

[`tracetools_analysis`](https://gitlab.com/ros-tracing/tracetools_analysis) provides a Python API to easily analyze traces.
We can use it in a [Jupyter notebook](https://jupyter.org/) with [bokeh](https://docs.bokeh.org/en/latest/index.html) to plot the data.
The `tracetools_analysis` repository contains a [few sample notebooks](https://gitlab.com/ros-tracing/tracetools_analysis/-/tree/master/tracetools_analysis/analysis), including [one notebook to analyze subscription callback durations](https://gitlab.com/ros-tracing/tracetools_analysis/-/blob/master/tracetools_analysis/analysis/callback_duration.ipynb).

For this guide, we will plot the durations of the subscription callback in the subscriber node.

Install Jupyter notebook and bokeh, and then open the sample notebook.

```sh
$ sudo apt-get install -y jupyter-notebook
$ pip3 install bokeh
$ jupyter notebook ~/tracing_ws/src/tracetools_analysis/tracetools_analysis/analysis/callback_duration.ipynb
```

This will open the notebook in the browser.

Replace the value for the `path` variable in the second cell to the path to the trace directory:

```py
path = '~/.ros/tracing/perf-test'
```

Run the notebook by clicking the *Run* button for each cell.
Running the cell that does the trace processing might take a few minutes on the first run, but subsequent runs will be much quicker.

You should get a plot that looks like this:

```eval_rst
.. image:: ../images/ros2_tracing_guide_result_plot.png
   :alt: callback durations result plot
   :align: center
```

We can see that most of the callbacks take less than 0.01 ms, but there are some outliers taking over 0.02 or 0.03 ms.

## Conclusion

This guide showed how to install tracing-related tools and build ROS 2 with tracing instrumentation.
Then it showed how to trace a [`performance_test`](https://gitlab.com/ApexAI/performance_test) experiment using [`ros2_tracing`](https://github.com/ros2/ros2_tracing) and plot the callback durations using [`tracetools_analysis`](https://gitlab.com/ros-tracing/tracetools_analysis).

For more trace analyses, take a look at the [other sample notebooks](https://gitlab.com/ros-tracing/tracetools_analysis/-/tree/master/tracetools_analysis/analysis) and the [`tracetools_analysis` API documentation](https://ros-tracing.gitlab.io/tracetools_analysis-api/master/tracetools_analysis/).
The [`ros2_tracing` design document](https://github.com/ros2/ros2_tracing/blob/master/doc/design_ros_2.md) also contains a lot of information.