Fork me on GitHub
Google

送 PyCon 门票啦

Update:已经送出。

我手上有一张 PyCon 上海(2019-09-21)的赠票,想要的话请通过任意方式告诉我,先到先得。具体参见

PyCon2019 中国Python开发者大会- 上海站

今年 PyCon China 大佬云集,不容错过。

P.S. 如果你想要门票,我希望是你本人去参加,而不是拿了送人。

Google

Demystifying EXTENDED_ARG

Recently I was studying Python bytecode for my personal project. One particular thing that bothered me was EXTENDED_ARG. Guess what, the documentation is outdated(or, you could say, wrong), thus causing my confusion. But even without the error, it is not easy to understand at first glance. In this article, I'll explain it in detail.

Early warning: If you've never heard of Python's bytecode, you probably want to learn about it before continue reading.

Before Python 3.6

Let's first take a look at the *original* documentation.

EXTENDED_ARG(ext)

Prefixes any opcode which has an argument too big to fit into the default two bytes. ext holds two additional bytes which, taken together with the subsequent opcode’s argument, comprise a four-byte argument, ext being the two most-significant bytes.

Before Python 3.6, each instruction takes 1 or 3 bytes, depending on whether it takes an argument. Example:

RETURN_VALUE             # opcode takes 1 byte, total is 1
LOAD_CONST     0x0000    # argument takes 2 bytes, total is 3

But what if you want to have a larger argument which cannot fit into 2 bytes? That's when EXTENDED_ARG comes into play. Let's say, the argument is 0x123456

LOAD_CONST     0x123456  # INVALID!! argument exceeds size limit
##################################################
EXTENDED_ARG   0x0012
LOAD_CONST     0x3456    # valid

Here's what Python does: it splits the argument into two parts, with 2 bytes each. The most significant 2 bytes 0x0012 becomes argument of EXTENDED_ARG, the remaining 2 bytes becomes argument of LOAD_CONST. When Python virtual machine sees EXTENDED_ARG, it knows to read the next instruction, and adds their arguments together. So the actual operation Python does is LOAD_CONST with argument 0x00123456.

After Python 3.6

Python 3.6 changed the size of instructions. So starting from Python 3.6, every instruction takes 2 bytes, where opcode still takes 1 byte, argument also takes 1 byte now. If there's no argument, it's just zero.

So what's the equivalent bytecode representation of LOAD_CONST 0x123456 in the latest version of Python? It's like this:

EXTENDED_ARG   0x12
EXTENDED_ARG   0x34
LOAD_CONST     0x56

The change is pretty straight forward, the only difference is the use of multiple EXTENDED_ARG. The size limit for argument is 4 bytes, so there will be 3 EXTENDED_ARG instructions at most, for each subsequent instruction(in this example, LOAD_CONST).

Wait, the documentation is wrong?

Funny enough, the documentation was not updated to reflect this change(which is understandable given the amount of work needed to be done). So I made a PR to fix it.

Play with it yourself

Finally, some code to prove the things we talked about.

I'll be using a brilliant library made by Victor Stinner, called bytecode. Thanks @thautwarm who introduced it to me.

Complete program can be found here.

In this program, I constructs a series of instructions by hand, which does two things: LOAD_CONST 0x1234567 , and then RETURN_VALUE to pop up the loaded value from VM stack. With the help of bytecode lib, things become really simple.

from bytecode import ConcreteInstr, ConcreteBytecode

CONST_ARG = 0x1234567  # The real argument we want to set.
cbc = bytecode.ConcreteBytecode()
cbc.consts = [None] * (CONST_ARG + 1)  # Make sure co_consts is big enough.
cbc.consts[CONST_ARG] = "foo"  # Sets the value we want to load.

if sys.version_info >= (3, 6):
    cbc.extend([
        ConcreteInstr("EXTENDED_ARG", 0x1),
        ConcreteInstr("EXTENDED_ARG", 0x23),
        ConcreteInstr("EXTENDED_ARG", 0x45),
        ConcreteInstr("LOAD_CONST", 0x67),
        ConcreteInstr("RETURN_VALUE"),
    ])

cbc.extend manually constructs the instructions, and the instructions should look familiar now. The tricky thing is about preparing value. Since we want to call LOAD_CONST 0x1234567, there needs to be a value located at co_consts[0x1234567](If you don't know what co_consts is, check out the doc). So what we do is set the value manually: cbc.consts[0x1234567] = "foo".

Now comes the interesting part, let's dis the code object we just created:

  1           0 EXTENDED_ARG             1
              2 EXTENDED_ARG           291
              4 EXTENDED_ARG         74565
              6 LOAD_CONST           19088743 ('foo')
              8 RETURN_VALUE

Is something wrong? Why is the argument different from what we set? Here's how it works

1 = 0x01
291 = 0x0123
74565 = 0x012345
19088743 = 0x01234567

Now it's clear, Python accumulates the arguments when seeing EXTENDED_ARG, and it becomes Instruction.arg. But under the hood, the byte value is still exactly what we set.

for raw_byte in code.co_code:
    print("raw code is: ", raw_byte)

"""
raw code is:  144   # EXTENDED_ARG
raw code is:  1     # 0x01
raw code is:  144   # EXTENDED_ARG
raw code is:  35    # 0x23
raw code is:  144   # EXTENDED_ARG
raw code is:  69    # 0x45
raw code is:  100   # LOAD_CONST
raw code is:  103   # 0x67
raw code is:  83    # RETURN_VALUE
raw code is:  0     # no argument, so zero
"""

The program also supports running with versions before 3.6, the only difference is the two bytes argument:

cbc.extend([
    ConcreteInstr("EXTENDED_ARG", 0x123),
    ConcreteInstr("LOAD_CONST", 0x4567),
    ConcreteInstr("RETURN_VALUE"),
])
Google

个人音乐存储,终极解决方案

多年以来,我一直在寻找满意的的音乐解决方案。三年前写过一篇文章《可以好好听音乐了》,从那时起我就不再依赖云平台。当时正好有一台实验室的服务器,于是我自己架设了 Subsonic 作为 streaming server。随着毕业,我把所有音乐搬到了 Google Play Music。我会把歌下下来,传到 Google Play Music,这是我近两年来听音乐的方式。

我满意 Google Play Music 的大部分功能,UI 算不上好但够用了,自动匹配封面也不错。唯一唯一的缺点,就是无法分享歌曲/播放列表。确切地说,分享功能是有的,但只有那些 Google Play Music 拥有版权的歌曲才对你的朋友可见。

最近我实在不想继续将就,遂又开始找新平台。这次试遍了市面上几乎所有服务,最终选定了 pCloud。先来说说我的需求吧:

  1. 能上传音乐并收听
  2. 无需自行架设服务
  3. 无需为流量付费
  4. Win、Mac、移动、网页版支持
  5. 不会移除、替换音乐
  6. 能分享播放列表

其它平台有哪些不足呢?

先说国内御三家,QQ、网易、虾米。QQ 和虾米无法上传,直接否定。网易倒是有个音乐云盘,但不知道怎么回事 Mac 版无法使用(有入口但点了没效果)。而且据说网易云会自动替换无版权音乐,这是我绝对无法接受的。

国外有什么呢?Spotify 用的最多,但它也没有上传功能,并且得付费。我还试了一大堆别的,都不满足需求。值得一提的是 Sound Cloud,我很喜欢它的 UI,上传分享功能也都有,问题和是它会自动移除检测到无版权的歌曲。

Streaming server 倒是很多,但是我不想自己维护,所以也不行。

再来就是云盘了,Google Drive 对音乐支持极差,百度网盘我不信任。Dropbox 据说还行,但是免费套餐只有 2GB 太少了。

然后我搜到一篇文章:Best Cloud Storage for Music 2019,其中推荐了 pCloud,正好我之前也偶尔用,就试了试。没想到居然完全满足需求。

播放器长这样:

不好看,但是够用了。

播放列表长这样:

嘛,普普通通。

默认 10G 空间暂时够用,再说我并不介意合理付费。

最关键的是,它能分享啊

laike9m's playlist

分享播放列表是我多年来的一个心愿。这个列表不是那种神曲选集,是纯按个人口味挑选的,当然其中也有很多大家熟悉的歌。类别的话基本就是 Anisong + 各种OST + 一些英文歌。这个列表我维护很多年了,未来还会持续更新。

最后欢迎大家给我推荐歌曲。那种特别热门的就不必了,我多半已经听过了。


top