Let’s Rethink Debugging

本文是我在 PyCascades 2021演讲的 proposal。虽说是 proposal,却是以接近博文的风格写作的(毕竟我只会这种风格。。),所以就直接放出来水一篇了。对应的 slide 在这里。下一篇文章我会聊聊参加 PyCascades 的经历。

This is my talk proposal for PyCascades 2021. Even though it's a proposal, it reads very much like an article, so I just post it here.

Abstract

As programmers, we do debugging almost every day. What are the major options for debugging, what advantages and disadvantages do they have? We'll start the talk by giving the audience an overview of the history of debugging and existing tools so they know how to pick from them.

Then, we'll help the audience gain a deeper understanding of what debugging is really about, and talk about two pain points with existing solutions. We'll introduce a novel approach to solve these pain points, with basic introduction to bytecode tracing so the audience can learn this useful technique.

Finally, we'll look into the future and talk about why it's important to be more innovative. We hope that by listening to this talk, the audience can be more open-minded thinking about debugging, and programming as a whole.

No specific knowledge required, but basic experience with debugging would be helpful.

Description

Here is a detailed description of each part.

Part 1: What debugging is really about?

Broadly speaking, a Python program can have four types of errors:

  • Syntax Error
  • Exits abnormally (e.g. unhandled exceptions, killed by OS)
  • The program can run, but gives wrong results
  • Gives correct results, but consumes more resources than expected (e.g. memory leak)

Among which, the third type of error is the most common, and also where programmers spent most of their time debugging. In this talk we focus on this type of error, aka "A Program can run, but gives wrong results".

I'll let the audience recall how they usually do debugging. It's not hard to spot that, no matter what approach we take, we're trying to answer one question:

What is the root cause of the wrong value?

This sounds straightforward, but it is vital that we realize it before going into the later sections.

Part 2: Retrospect the history of debugging

In the early days of programming, debugging meant dumping data of the system or output devices - literally printing, or displaying some flashy lights if there's an error. A very patient programmer then would go step-by-step through the code, reading it to see where the problem may be.

Then, in the 70s and 80s, the idea of "debugging software" came along, and people started to build command-line debuggers like gbx and GDB. Since then, despite new features like breakpoint, reverse debugging and graphical interface were added, the way people use debuggers stays pretty much the same: step through the program and look around.

Today, print, logging, and debugger remain to be the major ways for debugging, each with its advantages and drawbacks:

  • print:
    • Advantages: available out-of-the-box, clean information, does not affect program execution.
    • Drawbacks: requires familiarity with code, needs tweaking repeatedly, lack of context, hard to manage output.
  • Logging:
    • Advantages: configurable, easy to manage output (e.g. Sentry), richer context (lineno, filename, etc).
    • Drawbacks: configuration is not easy, requires familiarity with code, hard to search what you need, context still not enough.
  • Debugger:
    • Advantages: powerful, does not require familiarity with code, richest context to help identify problems.
    • Drawbacks: not always available, decent learning curve, can't persist output, needs human interaction.

Yet, with all these options, debugging is still hard sometimes. We'll see why in the next section.

Part 3: Let's rethink debugging

There are two pain points with existing debugging solutions:

  • There is no tool that is as easy-to-use as a print, yet provides rich information like a debugger.

    Tool Effort required Information provided
    print low simple
    logging medium medium
    debugger high rich
    ? low rich
  • Existing tools only give clues, without telling why.

    This is a bigger (yet hidden) problem.

    In the first part we talked about the goal for debugging, which is finding out the root cause of the wrong value. Let's use debugger as an example to recall how we usually debug. Let's say you're debugging a program, where c has an unexpected value:

    c = a + b  # c should be "foo", but instead is "bar"
    

    Here are the normal steps:

    1. Set a break point at this line.
    2. Run the program, inspect the value of a and b.
    3. Figure out whether the error lies in a or b.
    4. Set another break point, repeat 🔁

    Or, if you want to do it in one run:

    1. Set a break point at the entry point of the program.
    2. Step through and program and remember everything happened along the way.
    3. Stop at c = a + b, use your brain to infer what happened.

    Either way, we still need to spend time reading the code and following the execution. We also need to keep monitoring all relevant variables in every step, compare them with the expected values, and memorize the results, because debuggers don't persist them. This is a huge overhead to our brain, and as a result made debugging less efficient and sometimes frustrating.

    The problem is obvious: debuggers only give clues, without telling why. We've been taking the manual work for granted for so long, that we don't even think it's a problem. In fact it is a problem, and it can be solved.

Part 4: A novel approach to tackle the pain points

To reiterate, An ideal debugging tool should

  • Easy-to-use and provide rich information.
  • Tell you why a variable has a wrong value with no or minimal human intervention.

For a moment, let's forget about the tools we use every day, just think about one question: who has the information we need for debugging?

The answer is: the Python interpreter.

So the question becomes, how do we pull relevant information out of the interpreter?

I will briefly introduce the sys.settrace API, and the opcode event introduced in Python 3.7, with the example of c = a + b to demonstrate using bytecode to trace the sources of a variable. In this case, the sources of c are a and b. With this power, reasoning the root cause of a wrong value becomes possible.

I will then introduce Cyberbrain, a debugger that takes advantage of the power of bytecode to solve the pain points with variable backtracing. What it means is that, besides viewing each variable's value at every step (which Cyberbrain also supports), users can "see" the sources of each variable change in a visualized way. In the previous example, Cyberbrain will tell you that it's a and b that caused c to change. It also traces the sources of a and b all the way up to the point where tracing begins.

I'll do a quick demo of using Cyberbrain to debug a program to show how it solves the two pain points. By the end of the demo, the audience will realize that traditional debugging tools do require a lot of manual effort which can be automated.

Bytecode tracing also has its problems, like it can make program slower and generate a huge amount of data for long running programs. But the important thing is that we realize the pain points, and don't stop looking for new possibilities, which brings the next topic.

Part 5: Where do we go from here?

Now is an interesting time.

On the one hand, existing tools are becoming calcified. Debug Adapter Protocol is gaining popularity, which defines the capabilities a debugger should provide. Tools that conform to DAP will never be able to provide capabilities beyond what the protocol specifies.

On the other hand, new tools are coming out in Python's debugging space, just to list a few:

  • PySnooper, IceCream, Hunter, pytrace: lets you trace function calls and variables with no effort, automating the process of adding print().
  • birdseye, Thonny: graphical debuggers that can visualize the values of expressions.
  • Python Tutor: web-based interactive program visualization, which also visualizes data structures.
  • Cyberbrain.

These new tools share the same goal of reducing programmers' work in debugging, but beyond that, they are both trying to pitch the idea to people that the current "standard" way of debugging is not good enough, that more things can be achieved with less manual effort. The ideas behind are even more important than the tools themselves.

Why is this important? Dijkstra has some famous words:

The tools we use have a profound (and devious!) influence on our thinking habits, and, therefore, on our thinking abilities.

Imagine a world where all these efforts don't exist, will the word "debugger" gradually change from "something that can help you debug" to "something that conforms to the Debug Adapter Protocol"? That is not impossible. We need to prevent it from becoming the truth, and preserve a possible future where programmers are debugging in an effortless and more efficient way. So what can we do?

  • Think of new ways to make debugging better;
  • Create tools, or contribute to them;
  • Spread this talk and the ideas;
  • Create new programming languages that put debuggability as the core feature.

And the easiest, yet hardest thing: keep an open mind.

Why Is GIL Worse Than We Thought?

以前每当看到有人抱怨 GIL(Global Interpreter Lock),我总会告诉他们不用慌,各种场景都有对应的解决方案,比如主 IO 操作用 async,主 CPU 操作用多进程。我也一直认为,Python 的慢主要慢在“纯”执行速度,而 GIL 只不过是一个瑕疵。

然而最近我意识到,GIL 是一个比想象中严重得多的问题,因为它阻碍了程序的按需并行

什么是“按需并行”?这个词是我造的,用来描述编程中的一种常见 pattern,即把最耗时的那部分操作并行化,而程序整体仍保持单线程。通常来讲,耗时的部分往往是在遍历一个巨大的列表,并对列表中的元素做某种操作。而并行化也非常简单,只要开多个线程分别处理列表的一部分就行了。

这里我们只讨论 CPU 密集的情况。由于 GIL 的存在,开多个线程并不会让程序跑得更快(如果不是更慢的话),因此我们必须用到多进程。那么多进程是不是就能解决问题呢?并不总是,有一系列难点:

  • 进程不共享内存,计算的输入必须被传到每个工作进程里,比如列表中的元素;
  • 能被传递的东西必须 picklable,而有相当多的东西是 unpicklable 的;
  • 如果后续程序执行需要并行计算的输出,那么这些输出也得 picklable;
  • Pickle -> unpickle 操作带来了额外的性能开销。

这样一来,多进程的应用范围就大大减小了。比如我最近在 Cyberbrain 中遇到的一个问题,其中一段代码是这样的:

for event in frame.events:   
  frame_proto.events.append(_transform_event_to_proto(event))
    event_ids = _get_event_sources_uids(event, frame)
    if event_ids:
      frame_proto.tracing_result[event.uid].event_ids[:] = event_ids

这段代码遍历 frame.events,处理之后更新 frame_protoevents 数量很大,导致这部分代码成为了性能瓶颈,因此我想把它并行化。然后我就发现这是一个不可能完成的任务,为什么呢?因为 protocol buffer 对象不 pickable。这意味着,我既不能把 frame_proto 传进每个进程,也不能把 _transform_event_to_proto(event) 的结果传出来,因为它们都是 protocol buffer 对象。如果是 C++ 或者 Java,这里直接多线程就解决了(每个线程分别更新 frame_proto)。

总结一下:

  • GIL 让在大部分语言里可以用多线程解决的事必须要用多进程解决。
  • 多进程的诸多限制让它无法无缝替代多线程。即使在能替代的场景,也要做很多额外工作,以及承担序列化和反序列化带来的性能开销。

之前我们探讨了 GIL 对“并行”的阻碍,下面聊聊 GIL 对“按需”的阻碍。这是更本质的问题,却极少被人注意到。我们都知道,过早的优化是万恶之源。除了明显需要优化的场景(比如避免数据库 N+1),一般而言都是先实现,再 profile,最后优化。换句话说,类似“一个循环成了性能瓶颈”这种发现,写代码的时候一般是不知道的。假设你的程序写完了,然后需要优化某一部分,你当然希望能够不动其它代码,只修改瓶颈部分即可。这种优化的场景就非常适合多线程——因为变量是共享的,所以程序的整体完全不用动。而一旦涉及多进程,则往往需要对程序进行更大程度的修改,甚至重新设计整个架构。这样一来,“按需”优化就不存在了。这不仅导致优化困难,更给项目管理带来了不确定性,甚至可能导致延期或性能不达标。

那么,PEP 554 - 多解释器 是不是救世主呢?显然也不是。多解释器说白了就是 goroutine 的 Python 实现,问题是它限制了 channel 能传递的变量类型,quote:

Along those same lines, we will initially restrict the types that may be passed through channels to the following:

  • None
  • bytes
  • str
  • int
  • channels

所以,多解释器虽然是好的,但恐怕还是不能解决“按需并行”的问题。


注:Python 里多进程可以共享内存,然而能共享的变量类型同样有限,具体可参考:multiprocessing.shared_memory

Update: 发现一个遇到了类似问题的哥们儿,以及我的回复

Stars Don’t Make Me Happy. Feedbacks Do.

Like many other programmers, I used to get hyped when somebody stars my project on GitHub. However, I find myself becoming less and less excited about it. It's not to say that I hate it. Nobody hates stars, and I'm still happy to see the number grows. Yet, I found something that values more to me: user feedbacks.

So what are user feedbacks? Almost anything you can think of counts: issues, comments, questions, suggestions, PRs, articles, usage, etc. Feedbacks show that people are actually using your project, and it can help you improve. A negative feedback is way better than no feedback, because it tells you what you should work on next.

Feedbacks usually comes as a natural result of exposure. More people know it thus more people use it. But sometimes, you got a bunch of stars, yet nobody gives you any feedback. That's the situation Cyberbrain is currently facing, which bothers me a lot. Tian said he has a similar feeling for VizTracer, also xintao for iRedis. Why is it bothering? Because it leaves you in a clueless state, and you keep doubting yourself: am I doing good? Am I doing bad? Why do people seem to be interested but don't really use it? Do I just sit and wait, or should I reach out to some of the people, but whom? What should I work on next, are the planned features what users need the most?

I don't have an answer, and I hope it's because people are already satisfied or shocked by Cyberbrain. Luckily, I did receive some feedback from my friends, and I want to say thank you to all of you. It matters a lot to me.


top