python3虚拟机线程切换过程
python实现了自己的多线程,为了保证线程安全,引入了全局解释器锁GIL,只有拿到GIL的线程才能执行,所以在python中同一时刻只能有一个线程在运行,python多线程无法发挥多核处理器的威力,《python源码剖析》中对GIL存在的历史原因作了详细的描述,总之,目前来说GIL的方案可能是python多线程实现的最优解。
python3.13中对去除GIL作了实验性的尝试,使用去除GIL的python需要下载特定的编译版本,GIL相当于在全局范围内做了资源互斥,去除GIL后就需要以更细的粒度做资源互斥,这可能会导致去除GIL后的python执行效率还不及GIL的版本,不过python也在持续优化这一点。
在多线程执行时,持有GIL的线程在执行一段时间后需要释放GIL,以使其它线程也有机会执行,那么每个线程应该“享受”GIL多长时间呢?在python2中规定了每个线程持有GIL的时间片,即执行100条字节码指令就释放GIL,python3中线程执行的时间片不是固定的,但是依然可以指定一个时间片,通过sys.setswitchinterval
可以设定这个时间片,如果当前有线程持有GIL正在运行,那么等待GIL的线程会尝试等待一定时间,当超过等待时间后,那么在排队的线程就会向正在执行的线程发出“催促”,通过设置eval_breaker标志位来向执行线程发出释放GIL的信号,在字节码指令的设计中,会有许多检查eval_breaker的机会,执行线程当检查到eval_breaker中存在释放GIL的标志后,就会尝试释放GIL并重新排队(排队不是真的排队,这里只是比喻,等待GIL的线程是在公平竞争的),以给其它线程获取GIL的机会。
python的多线程机制涉及到系统调用和各个平台的兼容,相对比较复杂,这里就主要关注线程切换的过程,尝试对python3的线程切换过程进行分析。
线程执行
python3的底层线程模块为_thread,它的实现位于Modules/_threadmodule.c中,创建线程的入口即为thread_PyThread_start_new_thread函数,前期主要经过线程处理对象ThreadHandle的创建和线程状态对象PyThreadState的创建后,进入平台相关函数PyThread_start_joinable_thread,在该函数中会构建系统调用的参数并调用系统函数创建线程,传递给系统调用的是一个统一的函数入口thread_run,在系统原生线程创建出来后就会执行这个函数,thread_run函数会先获取自己的线程id并将线程状态对象绑定到全局运行时对象_PyRuntimeState中,随后调用PyEval_AcquireThread函数尝试获取GIL并执行线程代码,新线程获取GIL的核心操作便在这个PyEval_AcquireThread函数中。
获取GIL
新线程进入PyEval_AcquireThread函数尝试获取GIL,它的调用链是PyEval_AcquireThread->_PyThreadState_Attach->_PyEval_AcquireLock->take_gil,真正获取GIL的操作在take_gil中,这个函数位于Python/ceval_gil.c文件中,它的源码如下:
/* Take the GIL.The function saves errno at entry and restores its value at exit.tstate must be non-NULL.Returns 1 if the GIL was acquired, or 0 if not. */
static void
take_gil(PyThreadState *tstate)
{int err = errno;assert(tstate != NULL);/* We shouldn't be using a thread state that isn't viable any more. */// XXX It may be more correct to check tstate->_status.finalizing.// XXX assert(!tstate->_status.cleared);if (_PyThreadState_MustExit(tstate)) {/* bpo-39877: If Py_Finalize() has been called and tstate is not thethread which called Py_Finalize(), exit immediately the thread.This code path can be reached by a daemon thread after Py_Finalize()completes. In this case, tstate is a dangling pointer: points toPyThreadState freed memory. */PyThread_exit_thread();}assert(_PyThreadState_CheckConsistency(tstate));PyInterpreterState *interp = tstate->interp;struct _gil_runtime_state *gil = interp->ceval.gil;
#ifdef Py_GIL_DISABLEDif (!_Py_atomic_load_int_relaxed(&gil->enabled)) {return;}
#endif/* Check that _PyEval_InitThreads() was called to create the lock */assert(gil_created(gil));MUTEX_LOCK(gil->mutex);int drop_requested = 0;while (_Py_atomic_load_int_relaxed(&gil->locked)) {unsigned long saved_switchnum = gil->switch_number;unsigned long interval = (gil->interval >= 1 ? gil->interval : 1);int timed_out = 0;COND_TIMED_WAIT(gil->cond, gil->mutex, interval, timed_out);/* If we timed out and no switch occurred in the meantime, it is timeto ask the GIL-holding thread to drop it. */if (timed_out &&_Py_atomic_load_int_relaxed(&gil->locked) &&gil->switch_number == saved_switchnum){PyThreadState *holder_tstate =(PyThreadState*)_Py_atomic_load_ptr_relaxed(&gil->last_holder);if (_PyThreadState_MustExit(tstate)) {MUTEX_UNLOCK(gil->mutex);// gh-96387: If the loop requested a drop request in a previous// iteration, reset the request. Otherwise, drop_gil() can// block forever waiting for the thread which exited. Drop// requests made by other threads are also reset: these threads// may have to request again a drop request (iterate one more// time).if (drop_requested) {_Py_unset_eval_breaker_bit(holder_tstate, _PY_GIL_DROP_REQUEST_BIT);}PyThread_exit_thread();}assert(_PyThreadState_CheckConsistency(tstate));_Py_set_eval_breaker_bit(holder_tstate, _PY_GIL_DROP_REQUEST_BIT);drop_requested = 1;}}#ifdef Py_GIL_DISABLEDif (!_Py_atomic_load_int_relaxed(&gil->enabled)) {// Another thread disabled the GIL between our check above and// now. Don't take the GIL, signal any other waiting threads, and// return.COND_SIGNAL(gil->cond);MUTEX_UNLOCK(gil->mutex);return;}
#endif#ifdef FORCE_SWITCHING/* This mutex must be taken before modifying gil->last_holder:see drop_gil(). */MUTEX_LOCK(gil->switch_mutex);
#endif/* We now hold the GIL */_Py_atomic_store_int_relaxed(&gil->locked, 1);_Py_ANNOTATE_RWLOCK_ACQUIRED(&gil->locked, /*is_write=*/1);if (tstate != (PyThreadState*)_Py_atomic_load_ptr_relaxed(&gil->last_holder)) {_Py_atomic_store_ptr_relaxed(&gil->last_holder, tstate);++gil->switch_number;}#ifdef FORCE_SWITCHINGCOND_SIGNAL(gil->switch_cond);MUTEX_UNLOCK(gil->switch_mutex);
#endifif (_PyThreadState_MustExit(tstate)) {/* bpo-36475: If Py_Finalize() has been called and tstate is notthe thread which called Py_Finalize(), exit immediately thethread.This code path can be reached by a daemon thread which was waitingin take_gil() while the main thread calledwait_for_thread_shutdown() from Py_Finalize(). */MUTEX_UNLOCK(gil->mutex);/* tstate could be a dangling pointer, so don't pass it todrop_gil(). */drop_gil(interp, NULL, 1);PyThread_exit_thread();}assert(_PyThreadState_CheckConsistency(tstate));tstate->_status.holds_gil = 1;_Py_unset_eval_breaker_bit(tstate, _PY_GIL_DROP_REQUEST_BIT);update_eval_breaker_for_thread(interp, tstate);MUTEX_UNLOCK(gil->mutex);errno = err;return;
}
_PyThreadState_MustExit函数用于检查当前线程是否在退出状态,如果在退出状态则不再参与抢占GIL,随后进入while循环,获取gil->locked,如果locked为1,说明当前GIL被其它线程持有,在while循环中,首先保存当前switch_number,然后调用COND_TIMED_WAIT尝试等待interval时长,等待结束后进行判断,如果timeout为1,且gil->locked为1,且gil->switch_number == saved_switchnum,则说明经过interval时长,原来持有GIL的线程还在执行,依然没有释放GIL,那么就进入if语句块中,向执行线程发出释放GIL的信号,表明有线程在等待GIL。如果gil->switch_number != saved_switchnum,则说明在等待期间GIL已经被其它线程抢占了,白等了,重新开始新一轮while循环,设置saved_switchnum,再次等待GIL释放。
进入if语句块中,就会获取执行线程的状态对象holder_tstate,然后调用_Py_set_eval_breaker_bit函数向它的eval_breaker中设置_PY_GIL_DROP_REQUEST_BIT标志位,表明要求执行线程在下一个检查点释放GIL。当执行线程收到信号释放GIL后,等待的线程就可以进行抢占了。
释放GIL
那么正在执行的线程应该如何接收到释放GIL的通知呢?在python3的字节码中插入了许多检查当前线程eval_breaker的代码,是通过CHECK_EVAL_BREAKER宏实现的,比如在字节码开始的重置指令RESUME中就有CHECK_EVAL_BREAKER,跳转指令中也有,通过这些检查点来保证执行线程一定会收到释放信号,不会使GIL形成死锁。
CHECK_EVAL_BREAKER宏判断当前线程状态对象如果设置了eval_breaker则进入_Py_HandlePending函数处理标志位,_Py_HandlePending函数也位于Python/ceval_gil.c文件中,它的源码如下:
int
_Py_HandlePending(PyThreadState *tstate)
{uintptr_t breaker = _Py_atomic_load_uintptr_relaxed(&tstate->eval_breaker);/* Stop-the-world */if ((breaker & _PY_EVAL_PLEASE_STOP_BIT) != 0) {_Py_unset_eval_breaker_bit(tstate, _PY_EVAL_PLEASE_STOP_BIT);_PyThreadState_Suspend(tstate);/* The attach blocks until the stop-the-world event is complete. */_PyThreadState_Attach(tstate);}/* Pending signals */if ((breaker & _PY_SIGNALS_PENDING_BIT) != 0) {if (handle_signals(tstate) != 0) {return -1;}}/* Pending calls */if ((breaker & _PY_CALLS_TO_DO_BIT) != 0) {if (make_pending_calls(tstate) != 0) {return -1;}}#ifdef Py_GIL_DISABLED/* Objects with refcounts to merge */if ((breaker & _PY_EVAL_EXPLICIT_MERGE_BIT) != 0) {_Py_unset_eval_breaker_bit(tstate, _PY_EVAL_EXPLICIT_MERGE_BIT);_Py_brc_merge_refcounts(tstate);}
#endif/* GC scheduled to run */if ((breaker & _PY_GC_SCHEDULED_BIT) != 0) {_Py_unset_eval_breaker_bit(tstate, _PY_GC_SCHEDULED_BIT);_Py_RunGC(tstate);}/* GIL drop request */if ((breaker & _PY_GIL_DROP_REQUEST_BIT) != 0) {/* Give another thread a chance */_PyThreadState_Detach(tstate);/* Other threads may run now */_PyThreadState_Attach(tstate);}/* Check for asynchronous exception. */if ((breaker & _PY_ASYNC_EXCEPTION_BIT) != 0) {_Py_unset_eval_breaker_bit(tstate, _PY_ASYNC_EXCEPTION_BIT);PyObject *exc = _Py_atomic_exchange_ptr(&tstate->async_exc, NULL);if (exc != NULL) {_PyErr_SetNone(tstate, exc);Py_DECREF(exc);return -1;}}return 0;
}
_Py_HandlePending根据eval_breaker设置的不同的标志位进入不同分支处理,如果设置了_PY_GIL_DROP_REQUEST_BIT标志位,则调用_PyThreadState_Detach释放GIL,通过调用链_PyThreadState_Detach->detach_thread->_PyEval_ReleaseLock->drop_gil->drop_gil_impl最终释放了GIL,其实就是把gil->locked设为0而已,gil的原型其实就是一个布尔变量。
在释放完GIL后又会马上调用_PyThreadState_Attach重新进入到GIL的竞争中,从释放到获取的间隔中可能已经有线程抢到GIL并开始执行了。那么当前线程就和其它等待的线程一起重新竞争GIL,python中的多线程就通过这种通知-释放的机制进行轮流执行。