大师兄的Python源码学习笔记(十): Python的编译过程
大师兄的Python源码学习笔记(十二): Python虚拟机中的一般表达式(一)
一、关于字节码虚拟机
- 字节码虚拟机是Python的核心。
- 在源码被编译为字节码指令序列后,字节码虚拟机将接手整个工作。
- 字节码虚拟机会从编译得到的PyCodeObject对象中依次读入并执行每一条字节码指令。
二、可执行文件的运作方式
1. 可执行文件在x86机器上的大致运行原理
void a(int n)
{
printf("%d\n",n);
}
void b()
{
a(1);
}
int main()
{
b();
}
- 当程序进入a()时,调用者的栈是b()的栈帧,当前帧是a()的栈帧。
- 函数所有的局部变量操作都在自己的栈帧中完成,函数之间的调用则通过创建新的栈帧完成。
- 运行时栈是从地址空间的高地址向低地址延伸的,当b()调用a()时,系统就会在地址空间中,b()的栈帧之后创建a()的栈帧,并在a()中保存b()的栈指针esp和帧指针ebp。
- 当a()执行完成后,系统会把esp和ebp的值恢复为创建a()的栈帧之前的值,这样程序流程又回到b()中,程序工作的空间又回到了b()的栈帧中。
2. 关于执行环境
- 在Python中,PyCodeObject中储存着所有字节码指令和静态信息,但不包含程序运行的动态信息,即执行环境。
>>>i = 1
>
>>>def a():
>>> i = 2
>>> print(i)
>
>>>a()
>>>print(i)
2
1
- 上面的代码之所以i的值不同,是因为它们在不同的命名空间,而命名空间就是执行环境的一部分。
- 在开始执行.py程序时,Python会建立一个执行环境A,当函数调用时,会重新创建一个新的执行环境B,B实际就是一个新的栈帧。
- 所以,在Python真正执行的时候,对应的不是一个PyCodeObject,而是一个执行环境,即PyFrameObject。
三、关于PyFrameObject
1. PyFrameObject源码
Include\frameobject.h
typedef struct _frame {
PyObject_VAR_HEAD
struct _frame *f_back; /* previous frame, or NULL */
PyCodeObject *f_code; /* code segment */
PyObject *f_builtins; /* builtin symbol table (PyDictObject) */
PyObject *f_globals; /* global symbol table (PyDictObject) */
PyObject *f_locals; /* local symbol table (any mapping) */
PyObject **f_valuestack; /* points after the last local */
/* Next free slot in f_valuestack. Frame creation sets to f_valuestack.
Frame evaluation usually NULLs it, but a frame that yields sets it
to the current stack top. */
PyObject **f_stacktop;
PyObject *f_trace; /* Trace function */
char f_trace_lines; /* Emit per-line trace events? */
char f_trace_opcodes; /* Emit per-opcode trace events? */
/* Borrowed reference to a generator, or NULL */
PyObject *f_gen;
int f_lasti; /* Last instruction if called */
/* Call PyFrame_GetLineNumber() instead of reading this field
directly. As of 2.3 f_lineno is only valid when tracing is
active (i.e. when f_trace is set). At other times we use
PyCode_Addr2Line to calculate the line from the current
bytecode index. */
int f_lineno; /* Current line number */
int f_iblock; /* index in f_blockstack */
char f_executing; /* whether the frame is still executing */
PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */
} PyFrameObject;
参数 | 含义 |
---|---|
*f_back | 执行环境链的前一个frame |
*f_code | PyCodeObject对象 |
*f_builtins | 内置命名空间 |
*f_globals | global命名空间 |
*f_locals | local命名空间 |
**f_valuestack | 栈底 |
**f_stacktop | 栈顶 |
f_lasti | 上一条字节码指令在f_code中的偏移位置 |
f_lineno | 当前字节码对应的源代码行 |
f_executing | 正在运行的帧位置 |
*f_localsplus[1] | 动态所需空间 |
包含一个PyObject_VAR_HEAD,表示这是一个变长对象。
从f_back参数可以看出,许多PyFrameObject形成了一个链表结构,这是模拟栈帧关系中的esp和ebp指针。
f_code中存放了一个待执行的PyCodeObject对象。
-
*f_builtins、*f_globals和*f_locals分别维护着builtin、global和local的键值对对应关系。
类型对象如下:
Objects\frameobject.c
PyTypeObject PyFrame_Type = {
PyVarObject_HEAD_INIT(&PyType_Type, 0)
"frame",
sizeof(PyFrameObject),
sizeof(PyObject *),
(destructor)frame_dealloc, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_reserved */
(reprfunc)frame_repr, /* tp_repr */
0, /* tp_as_number */
0, /* tp_as_sequence */
0, /* tp_as_mapping */
0, /* tp_hash */
0, /* tp_call */
0, /* tp_str */
PyObject_GenericGetAttr, /* tp_getattro */
PyObject_GenericSetAttr, /* tp_setattro */
0, /* tp_as_buffer */
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,/* tp_flags */
0, /* tp_doc */
(traverseproc)frame_traverse, /* tp_traverse */
(inquiry)frame_tp_clear, /* tp_clear */
0, /* tp_richcompare */
0, /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
frame_methods, /* tp_methods */
frame_memberlist, /* tp_members */
frame_getsetlist, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
};
2. PyFrameObject动态内存空间
- 在PyFrameObject源码中,我们看到*f_localsplus[1]维护动态所需空间。
- 从创建PyFrameObject的过程可见,这段内存不只给栈使用。
Objects\frameobject.c
PyFrameObject*
PyFrame_New(PyThreadState *tstate, PyCodeObject *code,
PyObject *globals, PyObject *locals)
{
PyFrameObject *f = _PyFrame_New_NoTrack(tstate, code, globals, locals);
if (f)
_PyObject_GC_TRACK(f);
return f;
}
PyFrameObject* _Py_HOT_FUNCTION
_PyFrame_New_NoTrack(PyThreadState *tstate, PyCodeObject *code,
PyObject *globals, PyObject *locals)
{
PyFrameObject *back = tstate->frame;
PyFrameObject *f;
... ...
Py_ssize_t extras, ncells, nfrees;
ncells = PyTuple_GET_SIZE(code->co_cellvars);
nfrees = PyTuple_GET_SIZE(code->co_freevars);
extras = code->co_stacksize + code->co_nlocals + ncells +
nfrees;
if (free_list == NULL) {
f = PyObject_GC_NewVar(PyFrameObject, &PyFrame_Type,
extras);
if (f == NULL) {
Py_DECREF(builtins);
return NULL;
}
}
... ...
f->f_stacktop = f->f_valuestack;
... ...
}
- 可以看到由code->co_stacksize、code->co_nlocals、ncells和nfrees四部分构成了维护的动态内存区,与闭包实现相关,其大小由extra确定,而另一部分才是给运行栈使用的。
- 所以PyFrameObject对象的栈底由f_valuestack维护,栈顶由f_stacktop维护。
3. 在Python中访问PyFrameObject对象
- 在Python中可以使用sys._getframe()函数获得当前调用函数的函数信息。
>>>import sys
>
>>>def sample():
>>> f= sys._getframe()
>>> for a in dir(f):
>>> print(a,':',eval(f"f.{a}"))
>
>>>if __name__ == '__main__':
>>> sample()
__class__ : <class 'frame'>
__delattr__ : <method-wrapper '__delattr__' of frame object at 0x000002B851031278>
__dir__ : <built-in method __dir__ of frame object at 0x000002B851031278>
__doc__ : None
__eq__ : <method-wrapper '__eq__' of frame object at 0x000002B851031278>
__format__ : <built-in method __format__ of frame object at 0x000002B851031278>
__ge__ : <method-wrapper '__ge__' of frame object at 0x000002B851031278>
__getattribute__ : <method-wrapper '__getattribute__' of frame object at 0x000002B851031278>
__gt__ : <method-wrapper '__gt__' of frame object at 0x000002B851031278>
__hash__ : <method-wrapper '__hash__' of frame object at 0x000002B851031278>
__init__ : <method-wrapper '__init__' of frame object at 0x000002B851031278>
__init_subclass__ : <built-in method __init_subclass__ of type object at 0x00007FF858A15D90>
__le__ : <method-wrapper '__le__' of frame object at 0x000002B851031278>
__lt__ : <method-wrapper '__lt__' of frame object at 0x000002B851031278>
__ne__ : <method-wrapper '__ne__' of frame object at 0x000002B851031278>
__new__ : <built-in method __new__ of type object at 0x00007FF858A19EC0>
__reduce__ : <built-in method __reduce__ of frame object at 0x000002B851031278>
__reduce_ex__ : <built-in method __reduce_ex__ of frame object at 0x000002B851031278>
__repr__ : <method-wrapper '__repr__' of frame object at 0x000002B851031278>
__setattr__ : <method-wrapper '__setattr__' of frame object at 0x000002B851031278>
__sizeof__ : <built-in method __sizeof__ of frame object at 0x000002B851031278>
__str__ : <method-wrapper '__str__' of frame object at 0x000002B851031278>
__subclasshook__ : <built-in method __subclasshook__ of type object at 0x00007FF858A15D90>
clear : <built-in method clear of frame object at 0x000002B851031278>
f_back : <frame at 0x000002B852D4E9F8, file 'temp.py', line 21, code <module>>
f_builtins : {'__name__': 'builtins', '__doc__': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices.", '__package__': '', '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>), '__build_class__': <built-in function __build_class__>, '__import__': <built-in function __import__>, 'abs': <built-in function abs>, 'all': <built-in function all>, 'any': <built-in function any>, 'ascii': <built-in function ascii>, 'bin': <built-in function bin>, 'breakpoint': <built-in function breakpoint>, 'callable': <built-in function callable>, 'chr': <built-in function chr>, 'compile': <built-in function compile>, 'delattr': <built-in function delattr>, 'dir': <built-in function dir>, 'divmod': <built-in function divmod>, 'eval': <built-in function eval>, 'exec': <built-in function exec>, 'format': <built-in function format>, 'getattr': <built-in function getattr>, 'globals': <built-in function globals>, 'hasattr': <built-in function hasattr>, 'hash': <built-in function hash>, 'hex': <built-in function hex>, 'id': <built-in function id>, 'input': <built-in function input>, 'isinstance': <built-in function isinstance>, 'issubclass': <built-in function issubclass>, 'iter': <built-in function iter>, 'len': <built-in function len>, 'locals': <built-in function locals>, 'max': <built-in function max>, 'min': <built-in function min>, 'next': <built-in function next>, 'oct': <built-in function oct>, 'ord': <built-in function ord>, 'pow': <built-in function pow>, 'print': <built-in function print>, 'repr': <built-in function repr>, 'round': <built-in function round>, 'setattr': <built-in function setattr>, 'sorted': <built-in function sorted>, 'sum': <built-in function sum>, 'vars': <built-in function vars>, 'None': None, 'Ellipsis': Ellipsis, 'NotImplemented': NotImplemented, 'False': False, 'True': True, 'bool': <class 'bool'>, 'memoryview': <class 'memoryview'>, 'bytearray': <class 'bytearray'>, 'bytes': <class 'bytes'>, 'classmethod': <class 'classmethod'>, 'complex': <class 'complex'>, 'dict': <class 'dict'>, 'enumerate': <class 'enumerate'>, 'filter': <class 'filter'>, 'float': <class 'float'>, 'frozenset': <class 'frozenset'>, 'property': <class 'property'>, 'int': <class 'int'>, 'list': <class 'list'>, 'map': <class 'map'>, 'object': <class 'object'>, 'range': <class 'range'>, 'reversed': <class 'reversed'>, 'set': <class 'set'>, 'slice': <class 'slice'>, 'staticmethod': <class 'staticmethod'>, 'str': <class 'str'>, 'super': <class 'super'>, 'tuple': <class 'tuple'>, 'type': <class 'type'>, 'zip': <class 'zip'>, '__debug__': True, 'BaseException': <class 'BaseException'>, 'Exception': <class 'Exception'>, 'TypeError': <class 'TypeError'>, 'StopAsyncIteration': <class 'StopAsyncIteration'>, 'StopIteration': <class 'StopIteration'>, 'GeneratorExit': <class 'GeneratorExit'>, 'SystemExit': <class 'SystemExit'>, 'KeyboardInterrupt': <class 'KeyboardInterrupt'>, 'ImportError': <class 'ImportError'>, 'ModuleNotFoundError': <class 'ModuleNotFoundError'>, 'OSError': <class 'OSError'>, 'EnvironmentError': <class 'OSError'>, 'IOError': <class 'OSError'>, 'WindowsError': <class 'OSError'>, 'EOFError': <class 'EOFError'>, 'RuntimeError': <class 'RuntimeError'>, 'RecursionError': <class 'RecursionError'>, 'NotImplementedError': <class 'NotImplementedError'>, 'NameError': <class 'NameError'>, 'UnboundLocalError': <class 'UnboundLocalError'>, 'AttributeError': <class 'AttributeError'>, 'SyntaxError': <class 'SyntaxError'>, 'IndentationError': <class 'IndentationError'>, 'TabError': <class 'TabError'>, 'LookupError': <class 'LookupError'>, 'IndexError': <class 'IndexError'>, 'KeyError': <class 'KeyError'>, 'ValueError': <class 'ValueError'>, 'UnicodeError': <class 'UnicodeError'>, 'UnicodeEncodeError': <class 'UnicodeEncodeError'>, 'UnicodeDecodeError': <class 'UnicodeDecodeError'>, 'UnicodeTranslateError': <class 'UnicodeTranslateError'>, 'AssertionError': <class 'AssertionError'>, 'ArithmeticError': <class 'ArithmeticError'>, 'FloatingPointError': <class 'FloatingPointError'>, 'OverflowError': <class 'OverflowError'>, 'ZeroDivisionError': <class 'ZeroDivisionError'>, 'SystemError': <class 'SystemError'>, 'ReferenceError': <class 'ReferenceError'>, 'MemoryError': <class 'MemoryError'>, 'BufferError': <class 'BufferError'>, 'Warning': <class 'Warning'>, 'UserWarning': <class 'UserWarning'>, 'DeprecationWarning': <class 'DeprecationWarning'>, 'PendingDeprecationWarning': <class 'PendingDeprecationWarning'>, 'SyntaxWarning': <class 'SyntaxWarning'>, 'RuntimeWarning': <class 'RuntimeWarning'>, 'FutureWarning': <class 'FutureWarning'>, 'ImportWarning': <class 'ImportWarning'>, 'UnicodeWarning': <class 'UnicodeWarning'>, 'BytesWarning': <class 'BytesWarning'>, 'ResourceWarning': <class 'ResourceWarning'>, 'ConnectionError': <class 'ConnectionError'>, 'BlockingIOError': <class 'BlockingIOError'>, 'BrokenPipeError': <class 'BrokenPipeError'>, 'ChildProcessError': <class 'ChildProcessError'>, 'ConnectionAbortedError': <class 'ConnectionAbortedError'>, 'ConnectionRefusedError': <class 'ConnectionRefusedError'>, 'ConnectionResetError': <class 'ConnectionResetError'>, 'FileExistsError': <class 'FileExistsError'>, 'FileNotFoundError': <class 'FileNotFoundError'>, 'IsADirectoryError': <class 'IsADirectoryError'>, 'NotADirectoryError': <class 'NotADirectoryError'>, 'InterruptedError': <class 'InterruptedError'>, 'PermissionError': <class 'PermissionError'>, 'ProcessLookupError': <class 'ProcessLookupError'>, 'TimeoutError': <class 'TimeoutError'>, 'open': <built-in function open>, 'quit': Use quit() or Ctrl-Z plus Return to exit, 'exit': Use exit() or Ctrl-Z plus Return to exit, 'copyright': Copyright (c) 2001-2018 Python Software Foundation.
All Rights Reserved.
Copyright (c) 2000 BeOpen.com.
All Rights Reserved.
Copyright (c) 1995-2001 Corporation for National Research Initiatives.
All Rights Reserved.
Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
All Rights Reserved., 'credits': Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
for supporting Python development. See www.python.org for more information., 'license': Type license() to see the full license text, 'help': Type help() for interactive help, or help(object) for help about object.}
f_code : <code object sample at 0x000002B852E91660, file "xx/temp.py", line 15>
f_globals : {'__name__': '__main__', '__doc__': '\n@File : temp.py \n@Contact : xxx@xxx.com\n\n@Modify Time @Author @Version @Desciption\n------------ --------------- -------- -----------\n2021/x/x 16:54 大师兄(superkmi) 1.0 None\n', '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x000002B852DD9518>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': 'xx/temp.py', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'sample': <function sample at 0x000002B852D8C268>}
f_lasti : 38
f_lineno : 18
f_locals : {'f': <frame at 0x000002B851031278, file 'xx/temp.py', line 18, code sample>, 'a': 'f_locals'}
f_trace : None
f_trace_lines : True
f_trace_opcodes : False
四、虚拟机的运行框架
- Python虚拟机由PyEval_EvalFram函数为入口,调用了一个巨大的函数EvalFrameDefault。
ceval.c
/* Interpreter main loop */
PyObject *
PyEval_EvalFrame(PyFrameObject *f) {
/* This is for backward compatibility with extension modules that
used this API; core interpreter code should call
PyEval_EvalFrameEx() */
return PyEval_EvalFrameEx(f, 0);
}
PyObject *
PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
{
PyThreadState *tstate = PyThreadState_GET();
return tstate->interp->eval_frame(f, throwflag);
}
pystate.c
PyInterpreterState *
PyInterpreterState_New(void)
{
PyInterpreterState *interp = (PyInterpreterState *)
PyMem_RawMalloc(sizeof(PyInterpreterState));
if (interp == NULL) {
return NULL;
}
... ...
interp->eval_frame = _PyEval_EvalFrameDefault;
... ...
ceval.c
PyObject* _Py_HOT_FUNCTION
_PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag)
{
... ...
co = f->f_code;
names = co->co_names;
consts = co->co_consts;
fastlocals = f->f_localsplus;
freevars = f->f_localsplus + co->co_nlocals;
first_instr = (_Py_CODEUNIT *) PyBytes_AS_STRING(co->co_code);
... ...
next_instr = first_instr;
if (f->f_lasti >= 0) {
next_instr += f->f_lasti / sizeof(_Py_CODEUNIT) + 1;
}
stack_pointer = f->f_stacktop;
f->f_stacktop = NULL; /* remains NULL unless yield suspends frame */
... ...
}
- 在EvalFrameDefault中初始化了一些变量,包括PyFrameObject和PyCodeObject对象的重要信息。
- 同时,也初始化了栈顶指针,指向f->f_stacktop。
- Python虚拟机执行字节码指令序列的过程就是从头到尾遍历整个co_code,而co_code在PyCodeObject对象中保存字节码指令和字节码指令的参数,所以这个过程就是在依次执行字节码指令的过程。
- Python虚拟机利用3个char*类型变量来完成整个遍历过程,first_instr指向字节码指令序列的开始位置,next_instr指向下一条执行的字节码指令位置,f_lasti指向上一条已经执行过的字节码指令的位置。
- Python虚拟机执行字节码指令的整体架构,是一个for循环加上巨大的switch/case结构:
ceval.c
PyObject* _Py_HOT_FUNCTION
_PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag)
{
why = WHY_NOT;
if (throwflag) /* support for generator.throw() */
goto error;
#ifdef Py_DEBUG
/* PyEval_EvalFrameEx() must not be called with an exception set,
because it can clear it (directly or indirectly) and so the
caller loses its exception */
assert(!PyErr_Occurred());
#endif
for (;;) {
assert(stack_pointer >= f->f_valuestack); /* else underflow */
assert(STACK_LEVEL() <= co->co_stacksize); /* else overflow */
assert(!PyErr_Occurred());
/* Do periodic things. Doing this every time through
the loop would add too much overhead, so we do it
only every Nth instruction. We also do it if
``pendingcalls_to_do'' is set, i.e. when an asynchronous
event needs attention (e.g. a signal handler or
async I/O handler); see Py_AddPendingCall() and
Py_MakePendingCalls() above. */
if (_Py_atomic_load_relaxed(&_PyRuntime.ceval.eval_breaker)) {
opcode = _Py_OPCODE(*next_instr);
if (opcode == SETUP_FINALLY ||
opcode == SETUP_WITH ||
opcode == BEFORE_ASYNC_WITH ||
opcode == YIELD_FROM) {
/* Few cases where we skip running signal handlers and other
pending calls:
- If we're about to enter the 'with:'. It will prevent
emitting a resource warning in the common idiom
'with open(path) as file:'.
- If we're about to enter the 'async with:'.
- If we're about to enter the 'try:' of a try/finally (not
*very* useful, but might help in some cases and it's
traditional)
- If we're resuming a chain of nested 'yield from' or
'await' calls, then each frame is parked with YIELD_FROM
as its next opcode. If the user hit control-C we want to
wait until we've reached the innermost frame before
running the signal handler and raising KeyboardInterrupt
(see bpo-30039).
*/
goto fast_next_opcode;
}
if (_Py_atomic_load_relaxed(
&_PyRuntime.ceval.pending.calls_to_do))
{
if (Py_MakePendingCalls() < 0)
goto error;
}
if (_Py_atomic_load_relaxed(
&_PyRuntime.ceval.gil_drop_request))
{
/* Give another thread a chance */
if (PyThreadState_Swap(NULL) != tstate)
Py_FatalError("ceval: tstate mix-up");
drop_gil(tstate);
/* Other threads may run now */
take_gil(tstate);
/* Check if we should make a quick exit. */
if (_Py_IsFinalizing() &&
!_Py_CURRENTLY_FINALIZING(tstate))
{
drop_gil(tstate);
PyThread_exit_thread();
}
if (PyThreadState_Swap(tstate) != NULL)
Py_FatalError("ceval: orphan tstate");
}
/* Check for asynchronous exceptions. */
if (tstate->async_exc != NULL) {
PyObject *exc = tstate->async_exc;
tstate->async_exc = NULL;
UNSIGNAL_ASYNC_EXC();
PyErr_SetNone(exc);
Py_DECREF(exc);
goto error;
}
}
fast_next_opcode:
f->f_lasti = INSTR_OFFSET();
if (PyDTrace_LINE_ENABLED())
maybe_dtrace_line(f, &instr_lb, &instr_ub, &instr_prev);
/* line-by-line tracing support */
if (_Py_TracingPossible &&
tstate->c_tracefunc != NULL && !tstate->tracing) {
int err;
/* see maybe_call_line_trace
for expository comments */
f->f_stacktop = stack_pointer;
err = maybe_call_line_trace(tstate->c_tracefunc,
tstate->c_traceobj,
tstate, f,
&instr_lb, &instr_ub, &instr_prev);
/* Reload possibly changed frame fields */
JUMPTO(f->f_lasti);
if (f->f_stacktop != NULL) {
stack_pointer = f->f_stacktop;
f->f_stacktop = NULL;
}
if (err)
/* trace function raised an exception */
goto error;
}
/* Extract opcode and argument */
NEXTOPARG();
dispatch_opcode:
#ifdef DYNAMIC_EXECUTION_PROFILE
#ifdef DXPAIRS
dxpairs[lastopcode][opcode]++;
lastopcode = opcode;
#endif
dxp[opcode]++;
#endif
#ifdef LLTRACE
/* Instruction tracing */
if (lltrace) {
if (HAS_ARG(opcode)) {
printf("%d: %d, %d\n",
f->f_lasti, opcode, oparg);
}
else {
printf("%d: %d\n",
f->f_lasti, opcode);
}
}
#endif
switch (opcode) {
... ...
}
... ...
}
- 在这个执行架构中,对字节码的遍历是通过几个宏来实现的。
ceval.c
/* The integer overflow is checked by an assertion below. */
#define INSTR_OFFSET() \
(sizeof(_Py_CODEUNIT) * (int)(next_instr - first_instr))
#define NEXTOPARG() do { \
_Py_CODEUNIT word = *next_instr; \
opcode = _Py_OPCODE(word); \
oparg = _Py_OPARG(word); \
next_instr++; \
} while (0)
#define JUMPTO(x) (next_instr = first_instr + (x) / sizeof(_Py_CODEUNIT))
#define JUMPBY(x) (next_instr += (x) / sizeof(_Py_CODEUNIT))
- 判断字节码是否带参是通过宏HAS_ARG实现的。
Include\opcode.h
#define HAS_ARG(op) ((op) >= HAVE_ARGUMENT)
- Python在获得一条字节码指令和指令参数后,会对字节码指令利用switch进行判断,根据判断结果选择不同的case语句,每一条字节码会对应一个case语句,在case语句中,就是Python对字节码指令的实现。
ceval.c
#define TARGET(op) \
case op:
... ...
PyObject* _Py_HOT_FUNCTION
_PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag)
{
... ...
TARGET(STORE_SUBSCR) {
PyObject *sub = TOP();
PyObject *container = SECOND();
PyObject *v = THIRD();
int err;
STACKADJ(-3);
/* container[sub] = v */
err = PyObject_SetItem(container, sub, v);
Py_DECREF(v);
Py_DECREF(container);
Py_DECREF(sub);
if (err != 0)
goto error;
DISPATCH();
}
TARGET(DELETE_SUBSCR) {
PyObject *sub = TOP();
PyObject *container = SECOND();
int err;
STACKADJ(-2);
/* del container[sub] */
err = PyObject_DelItem(container, sub);
Py_DECREF(container);
Py_DECREF(sub);
if (err != 0)
goto error;
DISPATCH();
}
... ...
Include\opcode.h
/* Instruction opcodes for compiled code */
#define POP_TOP 1
#define ROT_TWO 2
#define ROT_THREE 3
#define DUP_TOP 4
#define DUP_TOP_TWO 5
#define NOP 9
#define UNARY_POSITIVE 10
#define UNARY_NEGATIVE 11
#define UNARY_NOT 12
#define UNARY_INVERT 15
#define BINARY_MATRIX_MULTIPLY 16
#define INPLACE_MATRIX_MULTIPLY 17
#define BINARY_POWER 19
#define BINARY_MULTIPLY 20
#define BINARY_MODULO 22
#define BINARY_ADD 23
#define BINARY_SUBTRACT 24
#define BINARY_SUBSCR 25
#define BINARY_FLOOR_DIVIDE 26
#define BINARY_TRUE_DIVIDE 27
#define INPLACE_FLOOR_DIVIDE 28
#define INPLACE_TRUE_DIVIDE 29
#define GET_AITER 50
#define GET_ANEXT 51
#define BEFORE_ASYNC_WITH 52
#define INPLACE_ADD 55
#define INPLACE_SUBTRACT 56
#define INPLACE_MULTIPLY 57
#define INPLACE_MODULO 59
#define STORE_SUBSCR 60
#define DELETE_SUBSCR 61
#define BINARY_LSHIFT 62
#define BINARY_RSHIFT 63
#define BINARY_AND 64
#define BINARY_XOR 65
#define BINARY_OR 66
#define INPLACE_POWER 67
#define GET_ITER 68
#define GET_YIELD_FROM_ITER 69
#define PRINT_EXPR 70
#define LOAD_BUILD_CLASS 71
#define YIELD_FROM 72
#define GET_AWAITABLE 73
#define INPLACE_LSHIFT 75
#define INPLACE_RSHIFT 76
#define INPLACE_AND 77
#define INPLACE_XOR 78
#define INPLACE_OR 79
#define BREAK_LOOP 80
#define WITH_CLEANUP_START 81
#define WITH_CLEANUP_FINISH 82
#define RETURN_VALUE 83
#define IMPORT_STAR 84
#define SETUP_ANNOTATIONS 85
#define YIELD_VALUE 86
#define POP_BLOCK 87
#define END_FINALLY 88
#define POP_EXCEPT 89
#define HAVE_ARGUMENT 90
#define STORE_NAME 90
#define DELETE_NAME 91
#define UNPACK_SEQUENCE 92
#define FOR_ITER 93
#define UNPACK_EX 94
#define STORE_ATTR 95
#define DELETE_ATTR 96
#define STORE_GLOBAL 97
#define DELETE_GLOBAL 98
#define LOAD_CONST 100
#define LOAD_NAME 101
#define BUILD_TUPLE 102
#define BUILD_LIST 103
#define BUILD_SET 104
#define BUILD_MAP 105
#define LOAD_ATTR 106
#define COMPARE_OP 107
#define IMPORT_NAME 108
#define IMPORT_FROM 109
#define JUMP_FORWARD 110
#define JUMP_IF_FALSE_OR_POP 111
#define JUMP_IF_TRUE_OR_POP 112
#define JUMP_ABSOLUTE 113
#define POP_JUMP_IF_FALSE 114
#define POP_JUMP_IF_TRUE 115
#define LOAD_GLOBAL 116
#define CONTINUE_LOOP 119
#define SETUP_LOOP 120
#define SETUP_EXCEPT 121
#define SETUP_FINALLY 122
#define LOAD_FAST 124
#define STORE_FAST 125
#define DELETE_FAST 126
#define RAISE_VARARGS 130
#define CALL_FUNCTION 131
#define MAKE_FUNCTION 132
#define BUILD_SLICE 133
#define LOAD_CLOSURE 135
#define LOAD_DEREF 136
#define STORE_DEREF 137
#define DELETE_DEREF 138
#define CALL_FUNCTION_KW 141
#define CALL_FUNCTION_EX 142
#define SETUP_WITH 143
#define EXTENDED_ARG 144
#define LIST_APPEND 145
#define SET_ADD 146
#define MAP_ADD 147
#define LOAD_CLASSDEREF 148
#define BUILD_LIST_UNPACK 149
#define BUILD_MAP_UNPACK 150
#define BUILD_MAP_UNPACK_WITH_CALL 151
#define BUILD_TUPLE_UNPACK 152
#define BUILD_SET_UNPACK 153
#define SETUP_ASYNC_WITH 154
#define FORMAT_VALUE 155
#define BUILD_CONST_KEY_MAP 156
#define BUILD_STRING 157
#define BUILD_TUPLE_UNPACK_WITH_CALL 158
#define LOAD_METHOD 160
#define CALL_METHOD 161
- 在成功执行完一条字节码指令后,Python的执行流程会跳转到fast_next_opcode或for循环处。
- 无论如何,Python接下来的动作都是获得下一条字节码指令和指令参数,执行下一条指令。
- 就这样一条条遍历co_code中的所有字节码指令,最终完成对Python程序的执行。
- _PyEval_EvalFrameDefault函数中的why变量,指示了退出for循环时的状态,比如异常。
ceval.c
/* Status code for main loop (reason for stack unwind) */
enum why_code {
WHY_NOT = 0x0001, /* No error */
WHY_EXCEPTION = 0x0002, /* Exception occurred */
WHY_RETURN = 0x0008, /* 'return' statement */
WHY_BREAK = 0x0010, /* 'break' statement */
WHY_CONTINUE = 0x0020, /* 'continue' statement */
WHY_YIELD = 0x0040, /* 'yield' operator */
WHY_SILENCED = 0x0080 /* Exception silenced by 'with' */
};
五、Python的运行时环境
1. 操作系统中的进程和线程
- 原生的win32可执行文件,多会在一个进程(Process)中运行。
- 但与机器指令序列相对应的活动对象是由线程Thread来进行抽象的,进程则是线程的活动环境。
- 对于单线程可执行文件,在执行时操作系统会创建一个进程,在进程中,又会有一个主线程。
- 对于多线程的可执行文件,操作系统会创建一个进程和多个线程,CPU在线程中不断切换,在切换时需要执行线程环境的保存工作,以实现线程的同步。
2. Python中的进程和线程
2.1 Python中的线程环境
- Python实现了对多线程的支持,并且Python中的每一个线程对应操作系统上的原生线程。
- 虚拟机就是Python对CPU的抽象,负责所有线程的计算工作。
- Python中的任务切换,就是不同线程轮流使用虚拟机的机制。
- Python使用PyThreadState保存当前的线程信息,每个线程都拥有一个PyThreadState对象,所以也可以将PyThreadState看做是线程状态的抽象。
Include\pystate.h
typedef struct _ts {
/* See Python/ceval.c for comments explaining most fields */
struct _ts *prev;
struct _ts *next;
PyInterpreterState *interp;
struct _frame *frame;
int recursion_depth;
char overflowed; /* The stack has overflowed. Allow 50 more calls
to handle the runtime error. */
char recursion_critical; /* The current calls must not cause
a stack overflow. */
int stackcheck_counter;
/* 'tracing' keeps track of the execution depth when tracing/profiling.
This is to prevent the actual trace/profile code from being recorded in
the trace/profile. */
int tracing;
int use_tracing;
Py_tracefunc c_profilefunc;
Py_tracefunc c_tracefunc;
PyObject *c_profileobj;
PyObject *c_traceobj;
/* The exception currently being raised */
PyObject *curexc_type;
PyObject *curexc_value;
PyObject *curexc_traceback;
/* The exception currently being handled, if no coroutines/generators
* are present. Always last element on the stack referred to be exc_info.
*/
_PyErr_StackItem exc_state;
/* Pointer to the top of the stack of the exceptions currently
* being handled */
_PyErr_StackItem *exc_info;
PyObject *dict; /* Stores per-thread state */
int gilstate_counter;
PyObject *async_exc; /* Asynchronous exception to raise */
unsigned long thread_id; /* Thread id where this tstate was created */
int trash_delete_nesting;
PyObject *trash_delete_later;
/* Called when a thread state is deleted normally, but not when it
* is destroyed after fork().
* Pain: to prevent rare but fatal shutdown errors (issue 18808),
* Thread.join() must wait for the join'ed thread's tstate to be unlinked
* from the tstate chain. That happens at the end of a thread's life,
* in pystate.c.
* The obvious way doesn't quite work: create a lock which the tstate
* unlinking code releases, and have Thread.join() wait to acquire that
* lock. The problem is that we _are_ at the end of the thread's life:
* if the thread holds the last reference to the lock, decref'ing the
* lock will delete the lock, and that may trigger arbitrary Python code
* if there's a weakref, with a callback, to the lock. But by this time
* _PyThreadState_Current is already NULL, so only the simplest of C code
* can be allowed to run (in particular it must not be possible to
* release the GIL).
* So instead of holding the lock directly, the tstate holds a weakref to
* the lock: that's the value of on_delete_data below. Decref'ing a
* weakref is harmless.
* on_delete points to _threadmodule.c's static release_sentinel() function.
* After the tstate is unlinked, release_sentinel is called with the
* weakref-to-lock (on_delete_data) argument, and release_sentinel releases
* the indirectly held lock.
*/
void (*on_delete)(void *);
void *on_delete_data;
int coroutine_origin_tracking_depth;
PyObject *coroutine_wrapper;
int in_coroutine_wrapper;
PyObject *async_gen_firstiter;
PyObject *async_gen_finalizer;
PyObject *context;
uint64_t context_ver;
/* Unique thread state id. */
uint64_t id;
/* XXX signal handlers should also be here */
} PyThreadState;
- 在结构中包含了*frame对象,这表示在PyThreadState对象中,维护着一个帧栈列表。
- 而在虚拟机源码中,也可以看到会将当前线程状态设置为当前的执行环境。
ceval.c
PyObject *
PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
{
PyThreadState *tstate = PyThreadState_GET();
return tstate->interp->eval_frame(f, throwflag);
}
pystate.c
PyThreadState *
PyThreadState_Get(void)
{
PyThreadState *tstate = GET_TSTATE();
if (tstate == NULL)
Py_FatalError("PyThreadState_Get: no current thread");
return tstate;
}
- 在创建PyFrameObject对象时,也会调用当前线程对象。
Objects\frameobject.c
PyFrameObject*
PyFrame_New(PyThreadState *tstate, PyCodeObject *code,
PyObject *globals, PyObject *locals)
{
PyFrameObject *f = _PyFrame_New_NoTrack(tstate, code, globals, locals);
if (f)
_PyObject_GC_TRACK(f);
return f;
}
2.2 Python中的进程环境
- 而对于进程的概念,Python是以PyInterpreterState对象来实现的。
- Python可以有多个逻辑上的PyInterpreterState,对应系统中的多进程。
- 但在通常环境下,Python中只有一个interpreter,它维护多个PyThreadState对象,而这些线程对象轮流使用一个字节码执行引擎。
- 而Python中使用全局解释器锁(Global Interpreter Lock,GIL)来实现所有线程的同步。
Include\pystate.h
typedef struct _is {
struct _is *next;
struct _ts *tstate_head;
int64_t id;
int64_t id_refcount;
PyThread_type_lock id_mutex;
PyObject *modules;
PyObject *modules_by_index;
PyObject *sysdict;
PyObject *builtins;
PyObject *importlib;
/* Used in Python/sysmodule.c. */
int check_interval;
/* Used in Modules/_threadmodule.c. */
long num_threads;
/* Support for runtime thread stack size tuning.
A value of 0 means using the platform's default stack size
or the size specified by the THREAD_STACK_SIZE macro. */
/* Used in Python/thread.c. */
size_t pythread_stacksize;
PyObject *codec_search_path;
PyObject *codec_search_cache;
PyObject *codec_error_registry;
int codecs_initialized;
int fscodec_initialized;
_PyCoreConfig core_config;
_PyMainInterpreterConfig config;
#ifdef HAVE_DLOPEN
int dlopenflags;
#endif
PyObject *builtins_copy;
PyObject *import_func;
/* Initialized to PyEval_EvalFrameDefault(). */
_PyFrameEvalFunction eval_frame;
Py_ssize_t co_extra_user_count;
freefunc co_extra_freefuncs[MAX_CO_EXTRA_USERS];
#ifdef HAVE_FORK
PyObject *before_forkers;
PyObject *after_forkers_parent;
PyObject *after_forkers_child;
#endif
/* AtExit module */
void (*pyexitfunc)(PyObject *);
PyObject *pyexitmodule;
uint64_t tstate_next_unique_id;
} PyInterpreterState;
-
综上所述,可以猜测Python的运行时环境如下: