Debugging programs running inside Docker containers, in production

2021/01

Table of Contents

Tools like strace, perf, gdb are part of the usual debugging toolbox on Linux. These are used to inspect behaviour of newly launched programs or of the already running ones.

This is not meant to be an introduction to debugging but rather a guide on how to get around the problems which appear when debugging programs running inside containers.

The focus is on debugging long-running programs (services). These can exhibit hard to reproduce bugs that happen as a result of complex interactions between multiple components and often require live debugging in a production environment.

The tools mentioned above have support for attaching to processes of a running service. For example --attach=pid option can be passed to strace and it will start tracing an already running process. If the tracee runs under a different user from the one we started strace with, elevated privileges are required.

$ strace -p $(pidof clickhouse-server)
strace: Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf: Operation not permitted
strace: attach: ptrace(PTRACE_SEIZE, 182079): Operation not permitted

One way to elevate privileges is to use sudo.

sudo strace -p $(pidof clickhouse-server) -f
strace: Process 182079 attached with 43 threads
[pid 182251] futex(0x7fdc65a65080, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 182191] restart_syscall(<... resuming interrupted read ...> <unfinished ...>
[pid 182188] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid 182187] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
...

Let’s try to use gdb.

sudo gdb --eval-command "set pagination 0" --eval-command "thread apply all bt" --batch --pid $(pidof clickhouse-server)
warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable.  Connect to gdbserver inside the container.
futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffe542e7368) at ../sysdeps/nptl/futex-internal.h:183
183	../sysdeps/nptl/futex-internal.h: No such file or directory.

Thread 45 (Thread 0x7fdc127fe000 (LWP 190355)):
#0  0x000000000e58aac4 in ?? ()
#1  0x000000000f8dd6c5 in DB::ISource::tryGenerate() ()
#2  0x000000000f8dd3ea in DB::ISource::work() ()
#3  0x000000000fa44afa in DB::SourceWithProgress::work() ()
#4  0x000000000f91728c in ?? ()
#5  0x000000000f914036 in DB::PipelineExecutor::executeStepImpl(unsigned long, unsigned long, std::__1::atomic<bool>*) ()
#6  0x000000000f9121b9 in DB::PipelineExecutor::executeImpl(unsigned long) ()
#7  0x000000000f911c2d in DB::PipelineExecutor::execute(unsigned long) ()
#8  0x000000000f91fa0e in ?? ()
#9  0x00000000086415ed in ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) ()
#10 0x00000000086451a3 in ?? ()
#11 0x00007fdc661f1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007fdc66112293 in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
...

It seems to work, but it also displays a warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable. Connect to gdbserver inside the container.

Parallel Universes Problem or Linux Namespaces

The clickhouse-server process I’m inspecting is actually running inside a Docker container (yandex/clickhouse-server) but the commands (strace, gdb) are run from the host machine.

Docker containers employ Linux Namespaces for isolation. Linux namespaces allow creating multiple parallel universes on the same machine.

Backtrace has less than desirable level of details and is also incomplete. This is because the binary is stripped of debug symbols. ClickHouse chooses to publish debug symbols as a separate package which is not installed in the image. Even if these were present in the container, gdb would still not be able to load them due to the same “two parallel universes” problem (mount namespaces).

Note: We got lucky (this time) with libc debug symbols. The same libc version is used both in container and on the host machine. Debug symbols installed on the host machine are compatible with the binary inside the container.

perf and most other debugging tools would run into similar issues as they all use mostly the same underlying primitives.

The solution is to run our debugging tools inside the same universe as our programs run in (same namespaces).

How not to debug programs inside Docker containers! or docker run … unconfined

docker exec -it clickhouse bash -c "apt update && apt install -y strace && strace -p 1"
strace: Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf: Operation not permitted
strace: attach: ptrace(PTRACE_SEIZE, 1): Operation not permitted

Googling the error leads to solutions that mostly are a variation of the following command to be used to run the service container.

docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined ...

While this indeed allows the process in question to be traced, it is a big no-no for a production service!

Production containers are run with limited set of privileges, just enough to perform their function which means that it is impossible to run tools that require elevated privileges, like gdb. Restarting containers with temporarily elevated privileges might be ok in some environments (or locally), but if a bug is hard to reproduce this approach isn’t useful at all.

Plenty has been written on the topic of Docker containers in the context of security and whether containers provide isolation.1, 2, 3, 4

While it is understood that containers do not provide absolute isolation, let’s not sabotage that intentionally.

How to debug programs inside Docker containers or switching namespaces

There are a handful of articles that understand the problem and provide solutions which involve switching to container namespaces.5, 6, 7

nsenter

The most elegant solution is nsenter. It allows to run programs in different namespaces or if I reword this: nsenter allows to run programs in the alternative universe where our service is running.

sudo nsenter --all --target $(pidof clickhouse-server) \
  bash -c 'apt-get update && apt-get install -y --no-install-recommends gdb && \
    gdb --eval-command "set pagination 0" --eval-command "thread apply all bt" --batch --pid 1'
Thread 45 (Thread 0x7fdc127fe000 (LWP 190355)):
#0  0x000000000e58aac4 in ?? ()
#1  0x000000000f8dd6c5 in DB::ISource::tryGenerate() ()
#2  0x000000000f8dd3ea in DB::ISource::work() ()
#3  0x000000000fa44afa in DB::SourceWithProgress::work() ()
#4  0x000000000f91728c in ?? ()
#5  0x000000000f914036 in DB::PipelineExecutor::executeStepImpl(unsigned long, unsigned long, std::__1::atomic<bool>*) ()
#6  0x000000000f9121b9 in DB::PipelineExecutor::executeImpl(unsigned long) ()
#7  0x000000000f911c2d in DB::PipelineExecutor::execute(unsigned long) ()
#8  0x000000000f91fa0e in ?? ()
#9  0x00000000086415ed in ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) ()
#10 0x00000000086451a3 in ?? ()
#11 0x00007fdc661f1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007fdc66112293 in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
...

docker run

An alternative is to launch a new container in the same namespaces where the target program runs and add additional capabilities needed to run debugging tools.

CONTAINER_ID=$(docker ps -aqf "name=^clickhouse$")
IMAGE=$(docker ps --format '{{.Image}}' -f "name=^clickhouse$")
docker run -it --rm \
  --pid="container:$CONTAINER_ID" --net="container:$CONTAINER_ID" \
  --cap-add sys_admin --cap-add sys_ptrace \
  "$IMAGE" \
   bash -c 'apt-get update && apt-get install -y --no-install-recommends gdb && \
      gdb --eval-command "set pagination 0" --eval-command "thread apply all bt" --batch --pid 1'

The most important difference between nstenter and docker run

Thread 45 (Thread 0x7fdc127fe000 (LWP 190355)):
#0  0x000000000e58aac4 in ?? ()
#1  0x000000000f8dd6c5 in DB::ISource::tryGenerate() ()
#2  0x000000000f8dd3ea in DB::ISource::work() ()
#3  0x000000000fa44afa in DB::SourceWithProgress::work() ()
#4  0x000000000f91728c in ?? ()
#5  0x000000000f914036 in DB::PipelineExecutor::executeStepImpl(unsigned long, unsigned long, std::__1::atomic<bool>*) ()
#6  0x000000000f9121b9 in DB::PipelineExecutor::executeImpl(unsigned long) ()
#7  0x000000000f911c2d in DB::PipelineExecutor::execute(unsigned long) ()
#8  0x000000000f91fa0e in ?? ()
#9  0x00000000086415ed in ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) ()
#10 0x00000000086451a3 in ?? ()
#11 0x00007fdc661f1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007fdc66112293 in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
...

nsenter provides the most flexibility and allows to run the extra programs in an environment as close to the debugged program as needed.

docker run supports only attaching to pid and net namespaces. On one hand not having the possibility to attach to mount namespace (have the same root filesystem) seems like a big disadvantage. However, we can use a different image with debugging tools, debug symbols and all else we might need already preinstalled. This is especially useful when the debugged program is in a FROM scratch container.

Tip: If you use this approach and need to access files from the container where debugged program runs then you can find them at /proc/1/root/.8

If you run Kubernetes, Ephemeral Containers9 approach is similar to docker run.10

An opinionated recipe for debugging programs running inside Docker containers, in production

Time to clarify the requirements and make some assumptions:

  1. This is intended for compiled languages (C/C++, …)
  2. Want to debug a program running inside a Docker container (ie attach to process with gdb)
    1. Containers run with default Seccomp, AppArmor, SELinux or further restricted
    2. Containers should not be bloated with software that is unnecessary for normal operation
      • No gdb and other similar tools bundled inside container
      • No debug symbols bundled
  3. We have permissions to run additional containers with elevated privileges

ClickHouse will be used as an example of a service to be debugged while running inside a Docker container. As part of the build process ClickHouse builds and publishes multiple deb packages. The official image published to Docker Hub installs all of them except the debug symbols and is the that was used previously when I tried to attach with gdb to.

Step 1: Build and publish 2 Docker images

The first image will be the one used to run the “production” service. I’ll use the official one for that yandex/clickhouse-server:21.1.2.15.

The second image can be built on top of the first one with all the additional things needed for debugging (all the necessary debug symbols, gdb…).

cat <<EOF > Dockerfile
FROM yandex/clickhouse-server:21.1.2.15

RUN apt-get update && \\
  apt-get install -y clickhouse-common-static-dbg=21.1.2.15 strace gdb
EOF

docker build -t yandex/clickhouse-server:21.1.2.15-debug .

Step 2: Run the service in “production” as usual

docker run -it --rm --name clickhouse yandex/clickhouse-server:21.1.2.15

Step 3: Run debug container within the same namespaces as the target

CONTAINER_ID=$(docker ps -aqf "name=^clickhouse$")
IMAGE=$(docker ps --format '{{.Image}}' -f "name=^clickhouse$")

docker run -it --rm \
  --pid="container:$CONTAINER_ID" --net="container:$CONTAINER_ID" \
  --cap-add sys_admin --cap-add sys_ptrace \
  "$IMAGE-debug" \
  bash -c 'gdb --eval-command "set pagination 0" --eval-command "thread apply all bt" --batch --pid 1'
Thread 44 (Thread 0x7f37750f3000 (LWP 144)):
#0  0x000000000e58aaa8 in DB::(anonymous namespace)::NumbersSource::generate (this=0x7f37a39f4a20) at ../src/Storages/System/StorageSystemNumbers.cpp:35
#1  0x000000000f8dd6c5 in DB::ISource::tryGenerate (this=0x7f3775c165d8) at ../src/Processors/ISource.cpp:79
#2  0x000000000f8dd3ea in DB::ISource::work (this=0x7f37a39f4a20) at ../src/Processors/ISource.cpp:53
#3  0x000000000fa44afa in DB::SourceWithProgress::work (this=0x7f37a39f4a20) at ../src/Processors/Sources/SourceWithProgress.cpp:36
#4  0x000000000f91728c in DB::executeJob (processor=0x7f37a39f4a20) at ../src/Processors/Executors/PipelineExecutor.cpp:79
#5  DB::PipelineExecutor::addJob(DB::ExecutingGraph::Node*)::$_0::operator()() const (this=0x7f37a3cd7e78) at ../src/Processors/Executors/PipelineExecutor.cpp:96
#6  std::__1::__invoke<DB::PipelineExecutor::addJob(DB::ExecutingGraph::Node*)::$_0&> (__f=...) at ../contrib/libcxx/include/type_traits:3519
#7  std::__1::__invoke_void_return_wrapper<void>::__call<DB::PipelineExecutor::addJob(DB::ExecutingGraph::Node*)::$_0&>(DB::PipelineExecutor::addJob(DB::ExecutingGraph::Node*)::$_0&) (__args=...) at ../contrib/libcxx/include/__functional_base:348
#8  std::__1::__function::__alloc_func<DB::PipelineExecutor::addJob(DB::ExecutingGraph::Node*)::$_0, std::__1::allocator<DB::PipelineExecutor::addJob(DB::ExecutingGraph::Node*)::$_0>, void ()>::operator()() (this=0x7f37a3cd7e78) at ../contrib/libcxx/include/functional:1540
#9  std::__1::__function::__func<DB::PipelineExecutor::addJob(DB::ExecutingGraph::Node*)::$_0, std::__1::allocator<DB::PipelineExecutor::addJob(DB::ExecutingGraph::Node*)::$_0>, void ()>::operator()() (this=0x7f37a3cd7e70) at ../contrib/libcxx/include/functional:1714
#10 0x000000000f914036 in std::__1::__function::__value_func<void ()>::operator()() const (this=<optimized out>) at ../contrib/libcxx/include/functional:1867
#11 std::__1::function<void ()>::operator()() const (this=<optimized out>) at ../contrib/libcxx/include/functional:2473
#12 DB::PipelineExecutor::executeStepImpl (this=0x7f37a3ca7fd8, thread_num=<optimized out>, num_threads=1, yield_flag=0x0) at ../src/Processors/Executors/PipelineExecutor.cpp:580
#13 0x000000000f9121b9 in DB::PipelineExecutor::executeSingleThread (this=0x7f37a3ca7fd8, thread_num=0, num_threads=1) at ../src/Processors/Executors/PipelineExecutor.cpp:473
#14 DB::PipelineExecutor::executeImpl (this=0x7f37a3ca7fd8, num_threads=1) at ../src/Processors/Executors/PipelineExecutor.cpp:807
#15 0x000000000f911c2d in DB::PipelineExecutor::execute (this=0x7f37a3ca7fd8, num_threads=139876175537624) at ../src/Processors/Executors/PipelineExecutor.cpp:395
#16 0x000000000f91fa0e in DB::threadFunction (data=..., thread_group=..., num_threads=1) at ../src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:79
#17 DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0::operator()() const (this=<optimized out>) at ../src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:101
#18 std::__1::__invoke_constexpr<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&> (__f=...) at ../contrib/libcxx/include/type_traits:3525
#19 std::__1::__apply_tuple_impl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&, std::__1::tuple<>&>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&, std::__1::tuple<>&, std::__1::__tuple_indices<>) (__f=..., __t=...) at ../contrib/libcxx/include/tuple:1415
#20 std::__1::apply<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&, std::__1::tuple<>&>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&, std::__1::tuple<>&) (__f=..., __t=...) at ../contrib/libcxx/include/tuple:1424
#21 ThreadFromGlobalPool::ThreadFromGlobalPool<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}::operator()() (this=<optimized out>) at ../src/Common/ThreadPool.h:178
#22 std::__1::__invoke<ThreadFromGlobalPool::ThreadFromGlobalPool<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}&> (__f=...) at ../contrib/libcxx/include/type_traits:3519
#23 std::__1::__invoke_void_return_wrapper<void>::__call<ThreadFromGlobalPool::ThreadFromGlobalPool<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}&>(ThreadFromGlobalPool::ThreadFromGlobalPool<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}&) (__args=...) at ../contrib/libcxx/include/__functional_base:348
#24 std::__1::__function::__alloc_func<ThreadFromGlobalPool::ThreadFromGlobalPool<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}, std::__1::allocator<{lambda()#1}>, void ()>::operator()() (this=<optimized out>) at ../contrib/libcxx/include/functional:1540
#25 std::__1::__function::__func<ThreadFromGlobalPool::ThreadFromGlobalPool<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}, std::__1::allocator<{lambda()#1}>, void ()>::operator()() (this=<optimized out>) at ../contrib/libcxx/include/functional:1714
#26 0x00000000086415ed in std::__1::__function::__value_func<void ()>::operator()() const (this=0x7f37750e9730) at ../contrib/libcxx/include/functional:1867
#27 std::__1::function<void ()>::operator()() const (this=0x7f37750e9730) at ../contrib/libcxx/include/functional:2473
#28 ThreadPoolImpl<std::__1::thread>::worker (this=0x7f37a4a65000, thread_it=...) at ../src/Common/ThreadPool.cpp:243
#29 0x00000000086451a3 in ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#3}::operator()() const (this=0x7f37a3c86c08) at ../src/Common/ThreadPool.cpp:124
#30 std::__1::__invoke<ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#3}> (__f=...) at ../contrib/libcxx/include/type_traits:3519
#31 std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#3}>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#3}>&, std::__1::__tuple_indices<>) (__t=...) at ../contrib/libcxx/include/thread:273
#32 std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#3}> >(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::{lambda()#3}>) (__vp=<optimized out>) at ../contrib/libcxx/include/thread:284
#33 0x00007f37a5179609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#34 0x00007f37a509a293 in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
...

Step 4: Install additional tools if (and when) needed (eg perf)

CONTAINER_ID=$(docker ps -aqf "name=^clickhouse$")
IMAGE=$(docker ps --format '{{.Image}}' -f "name=^clickhouse$")

docker run -it --rm \
  --pid="container:$CONTAINER_ID" --net="container:$CONTAINER_ID" \
  --cap-add sys_admin --cap-add sys_ptrace \
  --cap-add ipc_lock --cap-add syslog \
  "$IMAGE-debug" \
  bash

apt-get install linux-tools-$(uname -r)

perf top --stdio -p 1
   PerfTop:     561 irqs/sec  kernel:36.4%  exact:  0.0% lost: 0/0 drop: 774/774 [4000Hz cycles],  (target_pid: 1)
---------------------------------------------------------------------------------------------

    61.00%  clickhouse          [.] DB::(anonymous namespace)::NumbersSource::genera
     1.83%  clickhouse          [.] DB::PipelineExecutor::prepareProcessor
     1.32%  clickhouse          [.] DB::PipelineExecutor::executeStepImpl
     1.13%  clickhouse          [.] sallocx
     1.13%  clickhouse          [.] DB::ISimpleTransform::prepare
     0.99%  clickhouse          [.] DB::ISource::prepare
     0.87%  clickhouse          [.] operator delete
     0.85%  clickhouse          [.] extent_try_coalesce_impl.llvm.764109124236897626
     0.79%  clickhouse          [.] libunwind::CFI_Parser<libunwind::LocalAddressSpa
     0.77%  libpthread-2.31.so  [.] __pthread_mutex_trylock
     0.68%  clickhouse          [.] DB::ExpressionActions::execute
     0.66%  [kernel]            [k] copy_user_generic_string
...

Note: Installed perf package must match the running kernel version. Since containers share the host kernel it is possible that the image was build under one kernel but runs in production on a different kernel, so the package that provides perf is installed only when it is needed.


If you have a suggestion please shout @nvartolomei.