PyCascades is a regional PyCon in the Pacific Northwest, celebrating the west coast Python developer and user community. Our organizing team includes members of the Vancouver, Seattle, and Portland Python user groups.
到这步还不能直接录制，组委会要审核 slide 是否符合 Code of Conduct，并做一个 tech check。Tech check 委托了 Next Day Video（一个专业做会议视频的公司）进行，目的是为了确保讲师能够录制出效果足够好的视频。我先通过他们的系统预约，然后顺利进行并通过了 tech check。过程中，Next Day Video 的人还教了我怎么正确使用麦克风——原来我之前一直都摆错了方向导致无法获得最佳音效。果然这就是专业人士吗🤯。然后就是一些邮件+notion文档(1, 2, 3)，里面说明了参会和录制的各种要求，详细到视频的 bit rate 和 frame rate 都有规定。里面还推荐了一些录制用的软件。我最后采用的是 Apowersoft Free Online Screen Recorder。CoC review 通过之后，终于可以录制了。
算上 PyCon China 2020 的闪电演讲，这还只是我第二次录视频。我发现，录视频比现场演讲累太多了。可能有人会觉得，现场演讲多紧张啊要面对那么多观众，提前录的话讲坏了大不了重录或者剪辑嘛。但这也正是录视频累人的地方
反正我第一次试着完整录，25 分钟，讲完就不行了。那天剩下的时间我几乎没办法干任何事，感觉整个人都被榨干了。自然效果也非常差，后半段可以明显感觉到讲师很疲惫。于是接下来的几天继续录，不算中途就抛弃掉的，大概第三遍的时候我觉得差不多了。我把视频传给了 Next Day Video，几天之后被告知没有问题。接下来就一身轻松等待会议开幕了。
主办方在周五安排了一个线上的 social 活动，使用的平台是 SptialChat。这个产品还挺有意思的。进去之后有一个类似剧场的空间，你可以在空间中拖动自己的头像，模拟在物理空间中走动的感觉。当你把头像移到一群人旁边就可以听到他们讲话，并且距离越近听得越清晰。一开始我只是听，后来 Anthony Shaw 过来和我打招呼，我也就顺势混入那一群人瞎聊起来。我说我感觉录视频好累，Anthony 和另一个讲师说他们也发现录视频会让自己要求变高，到最后受不了只能把讲稿完全写下来。
主办方还找了个 DJ 在台上打碟。打碟的大叔全程也不和人交流，感觉很沉醉其中。
周六是正式会议，九点半开场。然而我七点就起床看了场球，还输了。郁闷了一阵想起来还要开会，点进去发现 Guido、Brett 等几位大佬已经开始聊天了。随后一直听，并反复在 recorded talk 和 live talk 之间切换，因为 PyCascades 的两个 track 是同时进行的。值得一看的演讲是 Your Name Is Invalid!，大概就是讲处理各种语言文本时各种可能出的问题，比如有的语言中大小写字母不是一一对应的。最后的结论是 "Don't assume anything"。这让我想起以前看过的讲时区处理的视频，想用代码囊括复杂多样的世界，何其难也。
会议之前若干周，主办方在 Slack 上开了几个私密频道给讲师，当作后台使用。会议当天，Next Day Video 的人会在后台联系讲师做 tech check——是的，为了确保万无一失，当天还有 tech check。我的演讲在下午 1:55，大概 1 点做 tech check。工作人员说声音没问题，但让我把后面的百叶窗拉上一点不让太亮了，并且还问能不能把摄像头放低一点。这里要解释一下，我用的是一个台式机连接大显示器，摄像头是放在显示器上面的，这样就造成了一种俯视的感觉。为了解决视角问题，我只能找两个纸巾盒子摞起来，把摄像头放在上面，这样勉强和我的头平齐。我还挺好奇其它人都是怎么做的。
临近演讲时间，我登入 Next Day Video 的讲师专用后台。和主持人对了一次词，包括要怎么称呼和介绍我。这次会议我统一跟人说叫我"laike"了，毕竟称呼不重要。我之前以为开场介绍过后就可以退出，没想到还得一直待在后台。我只能找了一台笔记本打开听自己的演讲。他们强调说视频的播放有一段时间的延迟，但是我听着听着就忘了。临近结尾，我看到 slack 上有消息，是主持人让我马上过来说结束语。也就是说，我需要在视频没结束的时候就去讲，因为有延迟，观众看到的就是视频一结束，我和主持人马上出镜。我手忙脚乱地坐回屏幕前，调整摄像头，讲了一句话才发现自己处于 mute 状态，又赶紧 unmute。这状况频出的直播大概会成为未来的美好回忆吧。
This is my talk proposal for PyCascades 2021. Even though it's a proposal, it reads very much like an article, so I just post it here.
As programmers, we do debugging almost every day. What are the major options for debugging, what advantages and disadvantages do they have? We'll start the talk by giving the audience an overview of the history of debugging and existing tools so they know how to pick from them.
Then, we'll help the audience gain a deeper understanding of what debugging is really about, and talk about two pain points with existing solutions. We'll introduce a novel approach to solve these pain points, with basic introduction to bytecode tracing so the audience can learn this useful technique.
Finally, we'll look into the future and talk about why it's important to be more innovative. We hope that by listening to this talk, the audience can be more open-minded thinking about debugging, and programming as a whole.
No specific knowledge required, but basic experience with debugging would be helpful.
Here is a detailed description of each part.
Part 1: What debugging is really about?
Broadly speaking, a Python program can have four types of errors:
Exits abnormally (e.g. unhandled exceptions, killed by OS)
The program can run, but gives wrong results
Gives correct results, but consumes more resources than expected (e.g. memory leak)
Among which, the third type of error is the most common, and also where programmers spent most of their time debugging. In this talk we focus on this type of error, aka "A Program can run, but gives wrong results".
I'll let the audience recall how they usually do debugging. It's not hard to spot that, no matter what approach we take, we're trying to answer one question:
What is the root cause of the wrong value?
This sounds straightforward, but it is vital that we realize it before going into the later sections.
Part 2: Retrospect the history of debugging
In the early days of programming, debugging meant dumping data of the system or output devices - literally printing, or displaying some flashy lights if there's an error. A very patient programmer then would go step-by-step through the code, reading it to see where the problem may be.
Then, in the 70s and 80s, the idea of "debugging software" came along, and people started to build command-line debuggers like gbx and GDB. Since then, despite new features like breakpoint, reverse debugging and graphical interface were added, the way people use debuggers stays pretty much the same: step through the program and look around.
Today, print, logging, and debugger remain to be the major ways for debugging, each with its advantages and drawbacks:
Advantages: available out-of-the-box, clean information, does not affect program execution.
Drawbacks: requires familiarity with code, needs tweaking repeatedly, lack of context, hard to manage output.
Drawbacks: configuration is not easy, requires familiarity with code, hard to search what you need, context still not enough.
Advantages: powerful, does not require familiarity with code, richest context to help identify problems.
Drawbacks: not always available, decent learning curve, can't persist output, needs human interaction.
Yet, with all these options, debugging is still hard sometimes. We'll see why in the next section.
Part 3: Let's rethink debugging
There are two pain points with existing debugging solutions:
There is no tool that is as easy-to-use as a print, yet provides rich information like a debugger.
Existing tools only give clues, without telling why.
This is a bigger (yet hidden) problem.
In the first part we talked about the goal for debugging, which is finding out the root cause of the wrong value. Let's use debugger as an example to recall how we usually debug. Let's say you're debugging a program, where c has an unexpected value:
c = a + b # c should be "foo", but instead is "bar"
Here are the normal steps:
Set a break point at this line.
Run the program, inspect the value of a and b.
Figure out whether the error lies in a or b.
Set another break point, repeat 🔁
Or, if you want to do it in one run:
Set a break point at the entry point of the program.
Step through and program and remember everything happened along the way.
Stop at c = a + b, use your brain to infer what happened.
Either way, we still need to spend time reading the code and following the execution. We also need to keep monitoring all relevant variables in every step, compare them with the expected values, and memorize the results, because debuggers don't persist them. This is a huge overhead to our brain, and as a result made debugging less efficient and sometimes frustrating.
The problem is obvious: debuggers only give clues, without telling why. We've been taking the manual work for granted for so long, that we don't even think it's a problem. In fact it is a problem, and it can be solved.
Part 4: A novel approach to tackle the pain points
To reiterate, An ideal debugging tool should
Easy-to-use and provide rich information.
Tell you why a variable has a wrong value with no or minimal human intervention.
For a moment, let's forget about the tools we use every day, just think about one question: who has the information we need for debugging?
The answer is: the Python interpreter.
So the question becomes, how do we pull relevant information out of the interpreter?
I will briefly introduce the sys.settrace API, and the opcode event introduced in Python 3.7, with the example of c = a + b to demonstrate using bytecode to trace the sources of a variable. In this case, the sources of c are a and b. With this power, reasoning the root cause of a wrong value becomes possible.
I will then introduce Cyberbrain, a debugger that takes advantage of the power of bytecode to solve the pain points with variable backtracing. What it means is that, besides viewing each variable's value at every step (which Cyberbrain also supports), users can "see" the sources of each variable change in a visualized way. In the previous example, Cyberbrain will tell you that it's a and b that caused c to change. It also traces the sources of a and b all the way up to the point where tracing begins.
I'll do a quick demo of using Cyberbrain to debug a program to show how it solves the two pain points. By the end of the demo, the audience will realize that traditional debugging tools do require a lot of manual effort which can be automated.
Bytecode tracing also has its problems, like it can make program slower and generate a huge amount of data for long running programs. But the important thing is that we realize the pain points, and don't stop looking for new possibilities, which brings the next topic.
Part 5: Where do we go from here?
Now is an interesting time.
On the one hand, existing tools are becoming calcified. Debug Adapter Protocol is gaining popularity, which defines the capabilities a debugger should provide. Tools that conform to DAP will never be able to provide capabilities beyond what the protocol specifies.
On the other hand, new tools are coming out in Python's debugging space, just to list a few:
birdseye, Thonny: graphical debuggers that can visualize the values of expressions.
Python Tutor: web-based interactive program visualization, which also visualizes data structures.
These new tools share the same goal of reducing programmers' work in debugging, but beyond that, they are both trying to pitch the idea to people that the current "standard" way of debugging is not good enough, that more things can be achieved with less manual effort. The ideas behind are even more important than the tools themselves.
Why is this important? Dijkstra has some famous words:
The tools we use have a profound (and devious!) influence on our thinking habits, and, therefore, on our thinking abilities.
Imagine a world where all these efforts don't exist, will the word "debugger" gradually change from "something that can help you debug" to "something that conforms to the Debug Adapter Protocol"? That is not impossible. We need to prevent it from becoming the truth, and preserve a possible future where programmers are debugging in an effortless and more efficient way. So what can we do?
Think of new ways to make debugging better;
Create tools, or contribute to them;
Spread this talk and the ideas;
Create new programming languages that put debuggability as the core feature.
And the easiest, yet hardest thing: keep an open mind.
for event in frame.events:
event_ids = _get_event_sources_uids(event, frame)
frame_proto.tracing_result[event.uid].event_ids[:] = event_ids
之前我们探讨了 GIL 对“并行”的阻碍，下面聊聊 GIL 对“按需”的阻碍。这是更本质的问题，却极少被人注意到。我们都知道，过早的优化是万恶之源。除了明显需要优化的场景（比如避免数据库 N+1），一般而言都是先实现，再 profile，最后优化。换句话说，类似“一个循环成了性能瓶颈”这种发现，写代码的时候一般是不知道的。假设你的程序写完了，然后需要优化某一部分，你当然希望能够不动其它代码，只修改瓶颈部分即可。这种优化的场景就非常适合多线程——因为变量是共享的，所以程序的整体完全不用动。而一旦涉及多进程，则往往需要对程序进行更大程度的修改，甚至重新设计整个架构。这样一来，“按需”优化就不存在了。这不仅导致优化困难，更给项目管理带来了不确定性，甚至可能导致延期或性能不达标。