Table of Contents
目錄
Boost.Python is an open source C++ library which provides a conciseIDL-like interface for binding C++ classes and functions toPython. Leveraging the full power of C++ compile-time introspectionand of recently developed metaprogramming techniques, this is achievedentirely in pure C++, without introducing a new syntax.Boost.Python's rich set of features and high-level interface make itpossible to engineer packages from the ground up as hybrid systems,giving programmers easy and coherent access to both the efficientcompile-time polymorphism of C++ and the extremely convenient run-timepolymorphism of Python.
Boost.Python是一個開源C++庫,它提供了一個簡明的IDL式的接口,用于把C++類和函數綁定到Python。借助C++強大的編譯時內省能力和最近發展的元編程技術,綁定工作完全用純C++實現,而沒有引入新的語法。Boost.Python豐富的特性和高級接口,使得完全按混合系統設計軟件包成為可能,并讓程序員以輕松連貫的方式,同時使用C++高效的編譯時多態,和Python極端便利的運行時多態。
Python and C++ are in many ways as different as two languages couldbe: while C++ is usually compiled to machine-code, Python isinterpreted. Python's dynamic type system is often cited as thefoundation of its flexibility, while in C++ static typing is thecornerstone of its efficiency. C++ has an intricate and difficultcompile-time meta-language, while in Python, practically everythinghappens at runtime.
作為兩種語言,Python和C++存在很多差異。C++一般被編譯為機器碼,而Python是解釋執行的。Python的動態類型系統通常被認為是它靈活性的基礎,而C++的靜態類型系統是C++效率的基石。C++有一種復雜艱深的編譯時元語言,而在Python中,幾乎一切都發生在運行時。
Yet for many programmers, these very differences mean that Python andC++ complement one another perfectly. Performance bottlenecks inPython programs can be rewritten in C++ for maximal speed, andauthors of powerful C++ libraries choose Python as a middlewarelanguage for its flexible system integration capabilities.Furthermore, the surface differences mask some strong similarities:
然而對很多程序員來說,這些差異也意味著Python和C++可以完美互補。為了提高運行速度,Python程序的性能瓶頸可以用C++重寫,而大型C++庫的作者們,為了獲得靈活的系統集成能力,選擇Python作為中間件語言。此外,在表面差異掩蓋之下,二者有一些非常相似之處:
Given Python's rich 'C' interoperability API, it should in principlebe possible to expose C++ type and function interfaces to Python withan analogous interface to their C++ counterparts. However, thefacilities provided by Python alone for integration with C++ arerelatively meager. Compared to C++ and Python, 'C' has only veryrudimentary abstraction facilities, and support for exception-handlingis completely missing. 'C' extension module writers are required tomanually manage Python reference counts, which is both annoyinglytedious and extremely error-prone. Traditional extension modules alsotend to contain a great deal of boilerplate code repetition whichmakes them difficult to maintain, especially when wrapping an evolvingAPI.
因為Python有著豐富的'C'語言集成API,原則上,向Python導出C++類型和函數接口應該是可行的,并且導出的接口與對應C++的接口應該是相似的。然而,Python本身提供的C++集成功能相對比較弱。和C++,Python相比,'C'只有非常基本的抽象能力,而且完全不支持異常處理。'C'擴展模塊的作者必須手工管理Python的引用計數,這不僅單調乏味,令人惱火,而且還極易出錯。傳統的擴展模塊往往包含大量重復的樣板代碼,使它們難以維護,尤其是當要封裝的API尚處于發展之中。
These limitations have lead to the development of a variety of wrappingsystems. SWIG is probably the most popular package for theintegration of C/C++ and Python. A more recent development is SIP,which was specifically designed for interfacing Python with the Qtgraphical user interface library. Both SWIG and SIP introduce theirown specialized languages for customizing inter-language bindings.This has certain advantages, but having to deal with three differentlanguages (Python, C/C++ and the interface language) also introducespractical and mental difficulties. The CXX package demonstrates aninteresting alternative. It shows that at least some parts ofPython's 'C' API can be wrapped and presented through a much moreuser-friendly C++ interface. However, unlike SWIG and SIP, CXX doesnot include support for wrapping C++ classes as new Python types.
這些限制導致了多種封裝系統的發展。SWIG可能是最流行的C/C++和Python集成系統。還有最近發展的SIP,它是專門為Qt圖形用戶界面庫設計的,用于提供Qt的Python接口。為了定制語言間的綁定,SWIG和SIP都引入了它們自己的專用語言。這有一定的好處,但是你不得不去應付三種不同語言(Python、C/C++和接口語言),所以也帶來了事實上和心理上的困難。 The features and goals of Boost.Python overlap significantly withmany of these other systems. That said, Boost.Python attempts tomaximize convenience and flexibility without introducing a separatewrapping language. Instead, it presents the user with a high-levelC++ interface for wrapping C++ classes and functions, managing much ofthe complexity behind-the-scenes with static metaprogramming.Boost.Python also goes beyond the scope of earlier systems byproviding: Boost.Python的特性和目標與這些系統有很多重疊。Boost.Python努力提高封裝的便利性和靈活性,但不引入單獨的封裝語言。相反,它通過靜態元編程,在幕后管理大量的復雜性,呈現給用戶一個高級C++接口來封裝C++類和函數。Boost.Python也在如下領域超越了早期的系統: Support for C++ virtual functions that can be overridden in Python. Comprehensive lifetime management facilities for low-level C++pointers and references. Support for organizing extensions as Python packages,with a central registry for inter-language type conversions. A safe and convenient mechanism for tying into Python's powerfulserialization engine (pickle). Coherence with the rules for handling C++ lvalues and rvalues thatcan only come from a deep understanding of both the Python and C++type systems. 支持C++虛函數,并能在Python中覆蓋。 對于低級的C++指針和引用,提供全面的生命期管理機制。 支持按Python包組織擴展模塊,通過中心注冊表進行語言間類型轉換。 通過一種安全方便的機制,引入Python強大的序列化引擎(pickle)。 與C++處理左值和右值的規則相一致,該一致性只能來自于對Python和C++類型系統的深入理解。 The key insight that sparked the development of Boost.Python is thatmuch of the boilerplate code in traditional extension modules could beeliminated using C++ compile-time introspection. Each argument of awrapped C++ function must be extracted from a Python object using aprocedure that depends on the argument type. Similarly the function'sreturn type determines how the return value will be converted from C++to Python. Of course argument and return types are part of eachfunction's type, and this is exactly the source from whichBoost.Python deduces most of the information required. 一個關鍵性的發現啟動了Boost.Python的開發,即利用C++的編譯時內省,可以消除傳統擴展模塊中的大量樣板代碼。如每個封裝的C++函數的參數都是從Python對象提取的,提取時必須根據參數類型調用相應的過程。類似地,函數返回值從C++轉換成Python時,返回值的類型決定了如何轉換。因為參數和返回值的類型是每個函數類型的一部分,所以Boost.Python可以從函數類型推導出大部分所需的信息。 This approach leads to user guided wrapping: as much information isextracted directly from the source code to be wrapped as is possiblewithin the framework of pure C++, and some additional information issupplied explicitly by the user. Mostly the guidance is mechanicaland little real intervention is required. Because the interfacespecification is written in the same full-featured language as thecode being exposed, the user has unprecedented power available whenshe does need to take control. 這種方法導致了“用戶指導的封裝(user guided wrapping)”:在純C++的框架內,從待封裝的源代碼中直接提取盡可能多的信息,而一些額外的信息由用戶顯式提供。通常這種指導是自動的,很少需要真正的干涉。因為接口規范和導出代碼是用同一門全功能的語言寫的,當用戶確實需要取得控制時,他所擁有的權力是空前強大的。
The features and goals of Boost.Python overlap significantly withmany of these other systems. That said, Boost.Python attempts tomaximize convenience and flexibility without introducing a separatewrapping language. Instead, it presents the user with a high-levelC++ interface for wrapping C++ classes and functions, managing much ofthe complexity behind-the-scenes with static metaprogramming.Boost.Python also goes beyond the scope of earlier systems byproviding:
Boost.Python的特性和目標與這些系統有很多重疊。Boost.Python努力提高封裝的便利性和靈活性,但不引入單獨的封裝語言。相反,它通過靜態元編程,在幕后管理大量的復雜性,呈現給用戶一個高級C++接口來封裝C++類和函數。Boost.Python也在如下領域超越了早期的系統:
The key insight that sparked the development of Boost.Python is thatmuch of the boilerplate code in traditional extension modules could beeliminated using C++ compile-time introspection. Each argument of awrapped C++ function must be extracted from a Python object using aprocedure that depends on the argument type. Similarly the function'sreturn type determines how the return value will be converted from C++to Python. Of course argument and return types are part of eachfunction's type, and this is exactly the source from whichBoost.Python deduces most of the information required.
一個關鍵性的發現啟動了Boost.Python的開發,即利用C++的編譯時內省,可以消除傳統擴展模塊中的大量樣板代碼。如每個封裝的C++函數的參數都是從Python對象提取的,提取時必須根據參數類型調用相應的過程。類似地,函數返回值從C++轉換成Python時,返回值的類型決定了如何轉換。因為參數和返回值的類型是每個函數類型的一部分,所以Boost.Python可以從函數類型推導出大部分所需的信息。
This approach leads to user guided wrapping: as much information isextracted directly from the source code to be wrapped as is possiblewithin the framework of pure C++, and some additional information issupplied explicitly by the user. Mostly the guidance is mechanicaland little real intervention is required. Because the interfacespecification is written in the same full-featured language as thecode being exposed, the user has unprecedented power available whenshe does need to take control.
這種方法導致了“用戶指導的封裝(user guided wrapping)”:在純C++的框架內,從待封裝的源代碼中直接提取盡可能多的信息,而一些額外的信息由用戶顯式提供。通常這種指導是自動的,很少需要真正的干涉。因為接口規范和導出代碼是用同一門全功能的語言寫的,當用戶確實需要取得控制時,他所擁有的權力是空前強大的。
The primary goal of Boost.Python is to allow users to expose C++classes and functions to Python using nothing more than a C++compiler. In broad strokes, the user experience should be one ofdirectly manipulating C++ objects from Python.
Boost.Python的首要目標是,讓用戶只用C++編譯器就能向Python導出C++類和函數。大體來講,用戶的體驗應該是,能夠從Python直接操作C++對象。
However, it's also important not to translate all interfaces tooliterally: the idioms of each language must be respected. Forexample, though C++ and Python both have an iterator concept, they areexpressed very differently. Boost.Python has to be able to bridge theinterface gap.
然而,有一點也很重要,那就是不要過于按字面翻譯所有接口:必須考慮每種語言的慣用法。例如,雖然C++和Python都有迭代器的概念,表達方式卻很不一樣。Boost.Python必須能夠消除這種接口的差異。
It must be possible to insulate Python users from crashes resultingfrom trivial misuses of C++ interfaces, such as accessingalready-deleted objects. By the same token the library shouldinsulate C++ users from low-level Python 'C' API, replacingerror-prone 'C' interfaces like manual reference-count management andraw PyObject pointers with more-robust alternatives.
Python用戶可能會誤用C++接口,因此,Boost.Python必須能夠隔離因輕微的誤用而造成的崩潰,例如訪問已刪除的對象。同樣的,Boost.Python庫應該把C++用戶從低級的Python 'C' API中解放出來,將容易出錯的'C'接口,如手工引用計數管理、原始的PyObject指針,替換為更健壯的接口。
Support for component-based development is crucial, so that C++ typesexposed in one extension module can be passed to functions exposed inanother without loss of crucial information like C++ inheritancerelationships.
支持基于組件的開發是至關重要的,這樣,一個擴展模塊導出的C++類型,可以傳遞給另一個模塊導出的函數,而不丟失重要的信息,比如C++的繼承關系。
Finally, all wrapping must be non-intrusive, without modifying oreven seeing the original C++ source code. Existing C++ libraries haveto be wrappable by third parties who only have access to header filesand binaries.
最后,所有的封裝必須是非侵入性的(non-intrusive),不能修改最初的C++源碼,甚至不必看到源碼。第三方必須能夠封裝現有的C++庫,即使他只有頭文件和二進制庫。
And now for a preview of Boost.Python, and how it improves on the rawfacilities offered by Python. Here's a function we might want toexpose:
現在來預覽一下Boost.Python,看看它是如何改進Python原有的封裝機制的。下面是我們想導出的函數:
char const* greet(unsigned x){ static char const* const msgs[] = { "hello", "Boost.Python", "world!" }; if (x > 2) throw std::range_error("greet: index out of range"); return msgs[x];}
To wrap this function in standard C++ using the Python 'C' API, we'dneed something like this:
在標準C++中,用Python 'C' API來封裝這個函數,我們需要像這樣做:
extern "C" // all Python interactions use 'C' linkage and calling convention{ // Wrapper to handle argument/result conversion and checking PyObject* greet_wrap(PyObject* args, PyObject * keywords) { int x; if (PyArg_ParseTuple(args, "i", &x)) // extract/check arguments { char const* result = greet(x); // invoke wrapped function return PyString_FromString(result); // convert result to Python } return 0; // error occurred } // Table of wrapped functions to be exposed by the module static PyMethodDef methods[] = { { "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" } , { NULL, NULL, 0, NULL } // sentinel }; // module initialization function DL_EXPORT init_hello() { (void) Py_InitModule("hello", methods); // add the methods to the module }}
Now here's the wrapping code we'd use to expose it with Boost.Python:
而這是用Boost.Python來導出函數的封裝代碼:
#include <boost/python.hpp>using namespace boost::python;BOOST_PYTHON_MODULE(hello){ def("greet", greet, "return one of 3 parts of a greeting");}
and here it is in action:
這是運行結果:
>>> import hello>>> for x in range(3):... print hello.greet(x)...helloBoost.Pythonworld!
Aside from the fact that the 'C' API version is much more verbose,it's worth noting a few things that it doesn't handle correctly:
使用'C' API的版本要冗長的多,此外,還需要注意,有些東西它沒有正確處理:
原來的函數接受一個無符號整數,然而Python 'C' API只能提取有符號整數。如果我們試圖向hello.greet傳遞一個負數,Boost.Python版會引發Python異常,而另一個則會繼續:執行C++代碼,將負數轉換為無符號數(通常會變成一個很大的數),然后把不正確的轉換結果傳遞給被封裝的函數。
這引起了第二個問題:如果輸入一個大于2的參數,C++ greet()函數會拋出異常。典型的,如果C++異常傳播時,跨越了'C'編譯器生成的代碼的邊界,就會導致崩潰。正如你在第一個版本中所見,那兒沒有防止崩潰的C++機制。而Boost.Python封裝的函數自動包含了異常處理層,它把未處理的C++異常翻譯成相應的Python異常,從而保護了Python用戶。
一個更微妙的限制是,Python 'C' API的參數轉換只能以“一種”方式取得整數x。如果有一個Python long對象(任意精度整數),它的大小正好屬于unsigned int,但不屬于signed long,PyArg_ParseTuple就不能對其進行轉換。對于一個定義了operator unsigned int(),即用戶自定義隱式轉換的C++封裝類,它同樣無法處理。而Boost.Python的動態類型轉換注冊表允許用戶添加任意的轉換方法。
This section outlines some of the library's major features. Except asneccessary to avoid confusion, details of library implementation areomitted.
本節簡述了庫的一些主要特性。在不影響理解的情況下,省略了庫的實現細節。
C++ classes and structs are exposed with a similarly-terse interface.Given:
C++類和結構是用同樣簡潔的接口導出的。如有:
struct World{ void set(std::string msg) { this->msg = msg; } std::string greet() { return msg; } std::string msg;};
The following code will expose it in our extension module:
以下代碼會將它導出到擴展模塊:
#include <boost/python.hpp>BOOST_PYTHON_MODULE(hello){ class_<World>("World") .def("greet", &World::greet) .def("set", &World::set) ;}
Although this code has a certain pythonic familiarity, peoplesometimes find the syntax bit confusing because it doesn't look likemost of the C++ code they're used to. All the same, this is juststandard C++. Because of their flexible syntax and operatoroverloading, C++ and Python are great for defining domain-specific(sub)languages(DSLs), and that's what we've done in Boost.Python. To break it down:
盡管上述代碼具有某種熟悉的Python風格,但語法還是有點令人迷惑,因為它看起來不像通常的C++代碼。但是,這仍然是正確的標準C++。因為C++和Python具有靈活的語法和運算符重載,它們都很善于定義特定領域(子)語言(DSLs, domain-specific (sub)languages)。我們在Boost.Python里面就是定義了一個DSL。把代碼拆開來看:
class_<World>("World")
constructs an unnamed object of type class_<World> and passes"World" to its constructor. This creates a new-style Python classcalled World in the extension module, and associates it with theC++ type World in the Boost.Python type conversion registry. Wemight have also written:
構造了一個匿名對象,類型為class_<World>,并把"World"傳遞給它的構造函數。這將在擴展模塊里創建一個新型Python類World,并在Boost.Python的類型轉換注冊表里,把它和C++類型World關聯起來。我們也可以這么寫:
class_<World> w("World");
but that would've been more verbose, since we'd have to name wagain to invoke its def() member function:
但是那會顯得更冗長,因為我們不得不再次通過w去調用它的def()成員函數:
w.def("greet", &World::greet)
There's nothing special about the location of the dot for memberaccess in the original example: C++ allows any amount of whitespace oneither side of a token, and placing the dot at the beginning of eachline allows us to chain as many successive calls to member functionsas we like with a uniform syntax. The other key fact that allowschaining is that class_<> member functions all return a referenceto *this.
原來的例子里的點表示成員訪問,它的位置沒有什么特別的:因為C++允許標記(token)的兩邊可以有任意數量的空白符。把點放在每行的開始,允許我們以一致的句法,鏈式串接連續的成員函數調用,想串多少都行。允許鏈式調用的另一關鍵是,class_<>的成員函數都返回對*this的引用。
So the example is equivalent to:
因此本例等同于:
class_<World> w("World");w.def("greet", &World::greet);w.def("set", &World::set);
It's occasionally useful to be able to break down the components of aBoost.Python class wrapper in this way, but the rest of this articlewill stick to the terse syntax.
這種方式將Boost.Python類包裝的部件都拆分開來了,能這樣拆分有時候是有用的。但本文下面仍將堅持使用簡潔格式。
For completeness, here's the wrapped class in use:
最后來看封裝類的使用:
>>> import hello>>> planet = hello.World()>>> planet.set('howdy')>>> planet.greet()'howdy'
Since our World class is just a plain struct, it has animplicit no-argument (nullary) constructor. Boost.Python exposes thenullary constructor by default, which is why we were able to write:
由于我們的World類只是一個簡單的struct,它有一個隱式的無參數的構造函數。Boost.Python默認會導出這個無參數的構造函數,所以我們可以這樣寫:
>>> planet = hello.World()
However, well-designed classes in any language may require constructorarguments in order to establish their invariants. Unlike Python,where __init__ is just a specially-named method, In C++constructors cannot be handled like ordinary member functions. Inparticular, we can't take their address: &World::World is anerror. The library provides a different interface for specifyingconstructors. Given:
然而,在任何語言里,對于設計良好的類,構造函數可能需要參數,以建立類的不變式(invariant)。Python的__init__只是一個特殊命名的方法,而C++的構造函數與Python不同,它不能像普通成員函數那樣處理。特別是,我們不能取它的地址:&World::World是一個錯誤。Boost.Python庫提供了一個不同的接口來指定構造函數。假設有:
struct World{ World(std::string msg); // added constructor ...
we can modify our wrapping code as follows:
我們可以如下修改封裝代碼:
class_<World>("World", init<std::string>()) ...
of course, a C++ class may have additional constructors, and we canexpose those as well by passing more instances of init<...> todef():
當然,C++類可能還有其他的構造函數,我們也可以導出它們,只需要向def()傳入更多的init<...>實例:
class_<World>("World", init<std::string>()) .def(init<double, double>()) ...
Boost.Python allows wrapped functions, member functions, andconstructors to be overloaded to mirror C++ overloading.
Boost.Python封裝的函數、成員函數,以及構造函數都可以重載,以映射C++中的重載。
Any publicly-accessible data members in a C++ class can be easilyexposed as either readonly or readwrite attributes:
C++中任何可公有訪問的數據成員,都能輕易地封裝成readonly或者readwrite屬性:
class_<World>("World", init<std::string>()) .def_readonly("msg", &World::msg) ...
and can be used directly in Python:
并直接在Python中使用:
>>> planet = hello.World('howdy')>>> planet.msg'howdy'
This does not result in adding attributes to the World instance__dict__, which can result in substantial memory savings whenwrapping large data structures. In fact, no instance __dict__will be created at all unless attributes are explicitly added fromPython. Boost.Python owes this capability to the new Python 2.2 typesystem, in particular the descriptor interface and property type.
這不會在World實例__dict__中添加屬性,從而在封裝大型數據結構時節省大量的內存。實際上,根本不會創建實例__dict__,除非從Python顯式添加屬性。Boost.Python的這種能力歸功于Python 2.2新的類型系統,尤其是描述符(descriptor)接口和property類型。
In C++, publicly-accessible data members are considered a sign of poordesign because they break encapsulation, and style guides usuallydictate the use of "getter" and "setter" functions instead. InPython, however, __getattr__, __setattr__, and since 2.2,property mean that attribute access is just one morewell-encapsulated syntactic tool at the programmer's disposal.Boost.Python bridges this idiomatic gap by making Python propertycreation directly available to users. If msg were private, wecould still expose it as attribute in Python as follows:
在C++中,人們認為,可公有訪問的數據成員是設計糟糕的標志,因為它們破壞了封裝性,并且風格指南通常指示使用“getter”和“setter”函數作為替代。然而在Python里,__getattr__、__setattr__,和2.2版出現的property意味著,屬性訪問僅僅是一種任由程序員選用的、封裝性更好的語法工具。Boost.Python讓用戶可直接創建Python property,從而消除了二者語言習慣上的差異。即使msg是私有的,我們仍可把它導出為Python中的屬性,如下:
class_<World>("World", init<std::string>()) .add_property("msg", &World::greet, &World::set) ...
The example above mirrors the familiar usage of properties in Python2.2+:
上例等同于Python 2.2+里面熟悉的屬性的用法:
>>> class World(object):... __init__(self, msg):... self.__msg = msg... def greet(self):... return self.__msg... def set(self, msg):... self.__msg = msg... msg = property(greet, set)
The ability to write arithmetic operators for user-defined types hasbeen a major factor in the success of both languages for numericalcomputation, and the success of packages like NumPy attests to thepower of exposing operators in extension modules. Boost.Pythonprovides a concise mechanism for wrapping operator overloads. Theexample below shows a fragment from a wrapper for the Boost rationalnumber library:
兩種語言都能夠為用戶自定義類型編寫算術運算符,這是它們在數值計算上獲得成功的主要因素,并且,像NumPy這樣的軟件包的成功證明了在擴展模塊中導出運算符的威力。Boost.Python為封裝運算符重載提供了簡潔的機制。下面是Boost有理數庫封裝代碼的片斷:
class_<rational<int> >("rational_int") .def(init<int, int>()) // constructor, e.g. rational_int(3,4) .def("numerator", &rational<int>::numerator) .def("denominator", &rational<int>::denominator) .def(-self) // __neg__ (unary minus) .def(self + self) // __add__ (homogeneous) .def(self * self) // __mul__ .def(self + int()) // __add__ (heterogenous) .def(int() + self) // __radd__ ...
The magic is performed using a simplified application of "expressiontemplates" [VELD1995], a technique originally developed foroptimization of high-performance matrix algebra expressions. Theessence is that instead of performing the computation immediately,operators are overloaded to construct a type representing thecomputation. In matrix algebra, dramatic optimizations are oftenavailable when the structure of an entire expression can be taken intoaccount, rather than evaluating each operation "greedily".Boost.Python uses the same technique to build an appropriate Pythonmethod object based on expressions involving self.
魔法的施展只是簡單應用了“表達式模板(expression templates)”[VELD1995],一種最初為高性能矩陣代數表達式優化而開發的技術。其精髓是,不是立即進行計算,而是利用運算符重載,來構造一個代表計算的類型。在矩陣代數里,當考慮整個表達式的結構,而不是“貪婪地”對每步運算求值時,經??梢垣@得顯著的優化。Boost.Python使用了同樣的技術,它用包含self的表達式,構建了一個適當的Python成員方法對象。
C++ inheritance relationships can be represented to Boost.Python by addingan optional bases<...> argument to the class_<...> templateparameter list as follows:
要在Boost.Python里描述C++繼承關系,可以在class_<...>模板參數列表里添加一個可選的bases<...>,如下:
class_<Derived, bases<Base1,Base2> >("Derived") ...
This has two effects:
這有兩種作用:
Of course it's possible to derive new Python classes from wrapped C++class instances. Because Boost.Python uses the new-style classsystem, that works very much as for the Python built-in types. Thereis one significant detail in which it differs: the built-in typesgenerally establish their invariants in their __new__ function, sothat derived classes do not need to call __init__ on the baseclass before invoking its methods :
當然,也可以從封裝的C++類實例派生新的Python類。因為Boost.Python使用了新型類系統,從封裝類派生就像是從Python內置類型派生一樣。但有一個重大區別:內置類型一般在__new__函數里建立不變式,因此其派生類不需要調用基類的__init__:
>>> class L(list):... def __init__(self):... pass...>>> L().reverse()>>>
Because C++ object construction is a one-step operation, C++ instancedata cannot be constructed until the arguments are available, in the__init__ function:
因為C++的對象構造是一個單步操作,在__init__函數里,只有參數齊全,才能構造C++實例數據:
>>> class D(SomeBoostPythonClass):... def __init__(self):... pass...>>> D().some_boost_python_method()Traceback (most recent call last): File "<stdin>", line 1, in ?TypeError: bad argument type for built-in operation
This happened because Boost.Python couldn't find instance data of typeSomeBoostPythonClass within the D instance; D's __init__function masked construction of the base class. It could be correctedby either removing D's __init__ function or having it callSomeBoostPythonClass.__init__(...) explicitly.
發生錯誤的原因是,Boost.Python在實例D中,找不到類型SomeBoostPythonClass的實例數據;D的__init__函數遮蓋了基類的構造函數。糾正方法為,刪除D的__init__函數,或者讓它顯式調用SomeBoostPythonClass.__init__(...)。
Deriving new types in Python from extension classes is not veryinteresting unless they can be used polymorphically from C++. Inother words, Python method implementations should appear to overridethe implementation of C++ virtual functions when called through baseclass pointers/references from C++. Since the only way to alter thebehavior of a virtual function is to override it in a derived class,the user must build a special derived class to dispatch a polymorphicclass' virtual functions:
用Python從擴展類派生新的類型沒有太大意思,除非可以在C++里面多態地使用派生類。換句話說,在C++里,通過基類指針或引用調用C++虛函數時,Python實現的方法應該看起來像是覆蓋了C++虛函數的實現。因為改變虛函數行為的唯一方法是,在派生類里覆蓋它,所以用戶必須構建一個特殊的派生類,來分派多態類的虛函數:
//// interface to wrap://class Base{ public: virtual int f(std::string x) { return 42; } virtual ~Base();};int calls_f(Base const& b, std::string x) { return b.f(x); }//// Wrapping Code//// Dispatcher classstruct BaseWrap : Base{ // Store a pointer to the Python object BaseWrap(PyObject* self_) : self(self_) {} PyObject* self; // Default implementation, for when f is not overridden int f_default(std::string x) { return this->Base::f(x); } // Dispatch implementation int f(std::string x) { return call_method<int>(self, "f", x); }};... def("calls_f", calls_f); class_<Base, BaseWrap>("Base") .def("f", &Base::f, &BaseWrap::f_default) ;
Now here's some Python code which demonstrates:
這是Python演示代碼:
>>> class Derived(Base):... def f(self, s):... return len(s)...>>> calls_f(Base(), 'foo')42>>> calls_f(Derived(), 'forty-two')9
Things to notice about the dispatcher class:
關于分派類需要注意:
Admittedly, this formula is tedious to repeat, especially on a projectwith many polymorphic classes. That it is neccessary reflects somelimitations in C++'s compile-time introspection capabilities: there'sno way to enumerate the members of a class and find out which arevirtual functions. At least one very promising project has beenstarted to write a front-end which can generate these dispatchers (andother wrapping code) automatically from C++ headers.
無可否認,重復這種公式化動作是冗長乏味的,尤其是項目里有大量多態類的時候。這里有必要反映一些C++編譯時內省能力的限制:C++無法列舉類的成員并找出虛函數。不過,至少有一個項目已經啟動,有希望編寫出一個前端程序,可以從C++頭文件自動生成這些分派類(和其他封裝代碼),
Pyste is being developed by Bruno da Silva de Oliveira. It builds onGCC_XML, which generates an XML version of GCC's internal programrepresentation. Since GCC is a highly-conformant C++ compiler, thisensures correct handling of the most-sophisticated template code andfull access to the underlying type system. In keeping with theBoost.Python philosophy, a Pyste interface description is neitherintrusive on the code being wrapped, nor expressed in some unfamiliarlanguage: instead it is a 100% pure Python script. If Pyste issuccessful it will mark a move away from wrapping everything directlyin C++ for many of our users. It will also allow us the choice toshift some of the metaprogram code from C++ to Python. We expect thatsoon, not only our users but the Boost.Python developers themselveswill be "thinking hybrid" about their own code.
Bruno da Silva de Oliveira正在開發Pyste。Pyste基于GCC_XML構建,而GCC_XML可以生成XML版本的GCC內部程序描述。因為GCC是一種高度兼容標準的C++編譯器,從而確保了對最復雜的模板代碼的正確處理,和對底層類型系統的完全訪問。和Boost.Python的哲學一致,Pyste接口描述既不侵入待封裝的代碼,也不使用某種不熟悉的語言來表達,相反,它是100%的純Python腳本。如果Pyste成功的話,它將標志,我們的許多用戶不必直接用C++封裝所有東西。Pyste也將允許我們選擇性地把一些元編程代碼從C++轉移到Python。我們期待不久以后,不僅用戶,而且Boost.Python開發者也能,“混合地思考”他們自己的代碼。(譯注:Pyste已不再維護,更新的是Py++。)
Serialization is the process of converting objects in memory to aform that can be stored on disk or sent over a network connection. Theserialized object (most often a plain string) can be retrieved andconverted back to the original object. A good serialization system willautomatically convert entire object hierarchies. Python's standardpickle module is just such a system. It leverages the language's strongruntime introspection facilities for serializing practically arbitraryuser-defined objects. With a few simple and unintrusive provisions thispowerful machinery can be extended to also work for wrapped C++ objects.Here is an example:
序列化(serialization)是指,把內存中的對象轉換成可保存格式,使之可以保存到磁盤上,或通過網絡傳送。序列化后的對象(最常見的是普通字符串),可以恢復并轉換回原來的對象。好的序列化系統會自動轉換整個對象層次結構。Python的標準模塊pickle正是這樣的系統。它利用了語言強大的運行時內省機制,可以序列化幾乎任意的用戶自定義對象。只需加入一些簡單的、非侵入的處理,就可以擴展這個威力巨大的系統,使它也能用于封裝的C++對象。下面是一個例子:
#include <string>struct World{ World(std::string a_msg) : msg(a_msg) {} std::string greet() const { return msg; } std::string msg;};#include <boost/python.hpp>using namespace boost::python;struct World_picklers : pickle_suite{ static tuple getinitargs(World const& w) { return make_tuple(w.greet()); }};BOOST_PYTHON_MODULE(hello){ class_<World>("World", init<std::string>()) .def("greet", &World::greet) .def_pickle(World_picklers()) ;}
Now let's create a World object and put it to rest on disk:
現在,讓我們創建一個World對象并把它保存到磁盤:
>>> import hello>>> import pickle>>> a_world = hello.World("howdy")>>> pickle.dump(a_world, open("my_world", "w"))
In a potentially different script on a potentially differentcomputer with a potentially different operating system:
然后,可能是在不同的計算機、不同的操作系統上,一個腳本可能這樣恢復對象:
>>> import pickle>>> resurrected_world = pickle.load(open("my_world", "r"))>>> resurrected_world.greet()'howdy'
Of course the cPickle module can also be used for fasterprocessing.
當然,使用cPickle模塊可以更快速地處理。
Boost.Python's pickle_suite fully supports the pickle protocoldefined in the standard Python documentation. Like a __getinitargs__function in Python, the pickle_suite's getinitargs() is responsible forcreating the argument tuple that will be use to reconstruct the pickledobject. The other elements of the Python pickling protocol,__getstate__ and __setstate__ can be optionally provided via C++getstate and setstate functions. C++'s static type system allows thelibrary to ensure at compile-time that nonsensical combinations offunctions (e.g. getstate without setstate) are not used.
Boost.Python的pickle_suite完全支持標準Python文檔定義的pickle協議。類似Python里的__getinitargs__函數,pickle_suite的getinitargs()負責創建參數元組,以重建pickle的對象。 Python pickle協議中的其他元素,__getstate__和__setstate__,可以通過C++ getstate和setstate函數來提供,也可以不提供。利用C++的靜態類型系統,Boost.Python庫在編譯時保證,不會使用沒有意義的函數組合(例如,有getstate無setstate)。
Enabling serialization of more complex C++ objects requires a littlemore work than is shown in the example above. Fortunately theobject interface (see next section) greatly helps in keeping thecode manageable.
要想序列化更復雜的C++對象,就需要做更多的工作。幸運的是,object接口(見下一節)幫了大忙,它保持了代碼的可管理性。
Experienced 'C' language extension module authors will be familiarwith the ubiquitous PyObject*, manual reference-counting, and theneed to remember which API calls return "new" (owned) references or"borrowed" (raw) references. These constraints are not justcumbersome but also a major source of errors, especially in thepresence of exceptions.
對于有經驗的'C'語言擴展模塊的作者,他們應該熟悉無所不在的PyObject*,手工引用計數,而且需要記住哪個API調用返回“新的”(擁有的)引用,哪個返回“借來的”(原始的)引用。這些約束不僅麻煩,而且是主要的錯誤源,尤其是面臨異常的時候。
Boost.Python provides a class object which automates referencecounting and provides conversion to Python from C++ objects ofarbitrary type. This significantly reduces the learning effort forprospective extension module writers.
Boost.Python提供了一個object類,它能夠自動進行引用計數,并且能把任意類型的C++對象轉換到Python。對于未來的擴展模塊的編寫者來說,這極大地減輕了學習的負擔。
Creating an object from any other type is extremely simple:
從任何其他類型創建object極其簡單:
object s("hello, world"); // s manages a Python string
object has templated interactions with all other types, withautomatic to-python conversions. It happens so naturally that it'seasily overlooked:
object和所有其他類型的交互,以及到Python的自動轉換,都已經模板化了。這一切進行得如此自然,以至于可以輕松地忽略掉它:
object ten_Os = 10 * s[4]; // -> "oooooooooo"
In the example above, 4 and 10 are converted to Python objectsbefore the indexing and multiplication operations are invoked.
上例中,在調用索引和乘法操作之前,4和10被轉換成了Python對象。
The extract<T> class template can be used to convert Python objectsto C++ types:
用類模板extract<T>可以把Python對象轉換成C++類型:
double x = extract<double>(o);
If a conversion in either direction cannot be performed, anappropriate exception is thrown at runtime.
如果有一個方向的轉換不能進行,則將在運行時拋出一個適當的異常。
The object type is accompanied by a set of derived typesthat mirror the Python built-in types such as list, dict,tuple, etc. as much as possible. This enables convenientmanipulation of these high-level types from C++:
除了object類型,還有一組派生類型,它們盡可能地對應于Python內置類型,如list、dict、tuple等等。這樣就能方便地從C++操作這些高級類型了:
dict d;d["some"] = "thing";d["lucky_number"] = 13;list l = d.keys();
This almost looks and works like regular Python code, but it is pureC++. Of course we can wrap C++ functions which accept or returnobject instances.
這看起來幾乎就像是正規的Python代碼,運行起來也像,但它是純的C++。當然我們也能封裝接受或返回object實例的C++函數。
Because of the practical and mental difficulties of combiningprogramming languages, it is common to settle a single language at theoutset of any development effort. For many applications, performanceconsiderations dictate the use of a compiled language for the corealgorithms. Unfortunately, due to the complexity of the static typesystem, the price we pay for runtime performance is often asignificant increase in development time. Experience shows thatwriting maintainable C++ code usually takes longer and requires farmore hard-earned working experience than developing comparable Pythoncode. Even when developers are comfortable working exclusively incompiled languages, they often augment their systems by some type ofad hoc scripting layer for the benefit of their users without everavailing themselves of the same advantages.
因為混合語言編程具有事實上和心理上的困難,所以普通的做法是,在任何開發活動開始時,先確定一種單一語言。對很多應用來說,性能上的考慮決定了核心算法要用編譯性語言實現。不幸的是,由于靜態類型系統的復雜性,為了運行時的性能,我們所付出的代價常常是,開發時間大大增加。經驗表明,和開發同等的Python代碼相比,編寫可維護的C++代碼通常需要更長的時間,并且要求多得多的來之不易的工作經驗。即使開發者覺得只用一門編譯性語言挺好,為了用戶的利益,他們也經常給他們的系統增加某種專門的腳本層,但是他們自己卻從沒利用這種好處。
Boost.Python enables us to think hybrid. Python can be used forrapidly prototyping a new application; its ease of use and the largepool of standard libraries give us a head start on the way to aworking system. If necessary, the working code can be used todiscover rate-limiting hotspots. To maximize performance these canbe reimplemented in C++, together with the Boost.Python bindingsneeded to tie them back into the existing higher-level procedure.
Boost.Python讓我們可以混合地思考(think hybrid)。Python可以為一個新應用快速搭建原型;在建立一個可運行的系統時,它的易用性和一大堆標準庫讓我們處于領先。如果有必要,可以用運行的代碼來揭示限制速度的熱點。為了提高性能,這些熱點可以用C++來重新實現,然后用Boost.Python綁定,并提供給現有的高級過程調用。
Of course, this top-down approach is less attractive if it is clearfrom the start that many algorithms will eventually have to beimplemented in C++. Fortunately Boost.Python also enables us topursue a bottom-up approach. We have used this approach verysuccessfully in the development of a toolbox for scientificapplications. The toolbox started out mainly as a library of C++classes with Boost.Python bindings, and for a while the growth wasmainly concentrated on the C++ parts. However, as the toolbox isbecoming more complete, more and more newly added functionality can beimplemented in Python.
當然,如果從一開始就清楚,有許多算法將最終不得不用C++實現,這個自上而下(top-down)的方法就不是那么吸引人了。幸運的是,Boost.Python讓我們也可以采用自下而上(bottom-up)的方法。我們曾經非常成功地應用這種方法,開發一個科學軟件工具箱。開始的時候,這個工具箱主要是一個C++類庫,并帶有Boost.Python綁定,并且有一段時間,其成長主要集中在C++的部分。然而,當工具箱越來越完善,越來越多的新增功能可以用Python實現。
This figure shows the estimated ratio of newly added C++ and Pythoncode over time as new algorithms are implemented. We expect thisratio to level out near 70% Python. Being able to solve new problemsmostly in Python rather than a more difficult statically typedlanguage is the return on our investment in Boost.Python. The abilityto access all of our code from Python allows a broader group ofdevelopers to use it in the rapid development of new applications.
該圖顯示,實現新的算法時,估計新增C++和Python代碼的比率隨時間的變化。我們預計這個比率會在接近70%的Python處變平。能夠主要地用Python來解決新問題,而不是用更困難的靜態類型語言,這是我們在Boost.Python上投入的回報。我們的所有代碼都能從Python訪問,這使得更多的開發者可以用它來快速開發新的應用。
The first version of Boost.Python was developed in 2000 by DaveAbrahams at Dragon Systems, where he was privileged to have Tim Petersas a guide to "The Zen of Python". One of Dave's jobs was to developa Python-based natural language processing system. Since it waseventually going to be targeting embedded hardware, it was alwaysassumed that the compute-intensive core would be rewritten in C++ tooptimize speed and memory footprint 1. The project also wanted totest all of its C++ code using Python test scripts 2. The onlytool we knew of for binding C++ and Python was SWIG, and at the timeits handling of C++ was weak. It would be false to claim any deepinsight into the possible advantages of Boost.Python's approach atthis point. Dave's interest and expertise in fancy C++ templatetricks had just reached the point where he could do some real damage,and Boost.Python emerged as it did because it filled a need andbecause it seemed like a cool thing to try.
Boost.Python的第一版是由Dragon Systems的Dave Abrahams在2000年開發的,在Dragon Systems,Dave有幸由Tim Peters引導,接受了“Python之禪(The Zen of Python)”。Dave的工作之一是,開發基于Python的自然語言處理系統(NLP,natural language processing)。由于最終要用于嵌入式硬件,所以總是假設,計算密集的內核將會用C++來重寫,以優化速度和內存占用1。這個項目也想用Python測試腳本來測試所有的C++代碼2。當時,我們所知的綁定C++和Python的唯一工具是SWIG,但那時它處理C++的能力比較弱。如果說在那時就有什么深知卓見,說Boost.Python的方法會有何等優越性,那是騙人的。那時,Dave正好對花俏的C++模板技巧感興趣,并且嫻熟到剛好能真正做點東西,Boost.Python就那樣出現了,因為它滿足了需求,因為它看起來挺酷,值得一試。
This early version was aimed at many of the same basic goals we'vedescribed in this paper, differing most-noticeably by having aslightly more cumbersome syntax and by lack of special support foroperator overloading, pickling, and component-based development.These last three features were quickly added by Ullrich Koethe andRalf Grosse-Kunstleve 3, and other enthusiastic contributors arrivedon the scene to contribute enhancements like support for nestedmodules and static member functions.
這個早期版本針對的目標,與我們在本文所述的許多基本目標相同,最顯著的區別在于,語法要稍微麻煩一點,并且,對運算符重載、pickling,和基于組件的開發缺乏專門的支持。后面這三個特性很快就由Ullrich Koethe和Ralf Grosse-Kunstleve加上了3,并且,其他熱心的貢獻者也出現了,并作了一些改進,如對嵌套模塊和靜態成員函數的支持等。
By early 2001 development had stabilized and few new features werebeing added, however a disturbing new fact came to light: Ralf hadbegun testing Boost.Python on pre-release versions of a compiler usingthe EDG front-end, and the mechanism at the core of Boost.Pythonresponsible for handling conversions between Python and C++ types wasfailing to compile. As it turned out, we had been exploiting a verycommon bug in the implementation of all the C++ compilers we hadtested. We knew that as C++ compilers rapidly became morestandards-compliant, the library would begin failing on moreplatforms. Unfortunately, because the mechanism was so central to thefunctioning of the library, fixing the problem looked very difficult.
到2001年初,開發已經穩定下來了,很少有新增特性了,然而,這時出現了一件新的麻煩事:Ralf在一個使用EDG前端的編譯器的預發布版上測試Boost.Python,他發現,Boost.Python內核中,Python和C++類型轉換機制無法通過編譯。結果表明,我們一直是在利用一個錯誤,這是一個非常普遍的錯誤,存在于所有我們已經測試過的C++編譯器的實現中。我們知道,隨著C++編譯器變得更加符合標準,很快,庫將開始在更多的平臺上失敗。很不幸,因為這套機制是Boost.Python庫功能的中樞,解決問題看起來非常困難。
Fortunately, later that year Lawrence Berkeley and later LawrenceLivermore National labs contracted with Boost Consulting for supportand development of Boost.Python, and there was a new opportunity toaddress fundamental issues and ensure a future for the library. Aredesign effort began with the low level type conversion architecture,building in standards-compliance and support for component-baseddevelopment (in contrast to version 1 where conversions had to beexplicitly imported and exported across module boundaries). A newanalysis of the relationship between the Python and C++ objects wasdone, resulting in more intuitive handling for C++ lvalues andrvalues.
幸運的是,那一年末,Lawrence Berkeley,后來建立了Lawrence Livermore National labs,與Boost Consulting簽訂了合同,來支持和發展Boost.Python,這樣就有了一個新的機會來處理庫的基本問題,從而確保了庫未來的發展。庫進行了重新設計,開始于底層的類型轉換架構,使它內置具有標準兼容性,并支持基于組件的開發(第1版中,轉換必須顯式地在模塊間導入和導出)。對Python和C++對象的關系進行了新的分析,從而能更直觀地處理C++左值和右值。
The emergence of a powerful new type system in Python 2.2 made thechoice of whether to maintain compatibility with Python 1.5.2 easy:the opportunity to throw away a great deal of elaborate code foremulating classic Python classes alone was too good to pass up. Inaddition, Python iterators and descriptors provided crucial andelegant tools for representing similar C++ constructs. Thedevelopment of the generalized object interface allowed us tofurther shield C++ programmers from the dangers and syntactic burdensof the Python 'C' API. A great number of other features including C++exception translation, improved support for overloaded functions, andmost significantly, CallPolicies for handling pointers andreferences, were added during this period.
關于是否維護對Python 1.5.2的兼容性,因為Python 2.2里出現了一個強大的新的類型系統,選擇變得容易了:這個機會好的令人無法拒絕,籍此可以拋棄大量復雜精細的代碼,而這些代碼僅僅是用來模擬傳統的Python類。另外,Python的迭代器(iterator)和描述符(descriptor)提供了重要且優雅的工具,用來表示類似的C++構造。通用的object接口的開發進一步方便了C++程序員,免除了Python 'C' API的危險性和語法負擔。這一階段,還添加了大量其他特性,包括C++異常翻譯,對函數重載的更好的支持,還有最重要的,用來處理指針和引用的CallPolicies。
In October 2002, version 2 of Boost.Python was released. Developmentsince then has concentrated on improved support for C++ runtimepolymorphism and smart pointers. Peter Dimov's ingeniousboost::shared_ptr design in particular has allowed us to give thehybrid developer a consistent interface for moving objects back andforth across the language barrier without loss of information. Atfirst, we were concerned that the sophistication and complexity of theBoost.Python v2 implementation might discourage contributors, but theemergence of Pyste and several other significant featurecontributions have laid those fears to rest. Daily questions on thePython C++-sig and a backlog of desired improvements show that thelibrary is getting used. To us, the future looks bright.
2002年十月,Boost.Python第2版發布了。從那以后,開發集中于更好地支持C++運行時多態性和智能指針。特別是Peter Dimov巧妙的boost::shared_ptr 的設計,使我們能給混和系統開發者提供一個一致的接口,用于跨越語言屏障來回移動對象而不丟失信息。剛開始,我們擔心Boost.Python v2實現的詭秘與復雜會阻礙貢獻者,但Pyste的出現,和其他幾個重要特性的貢獻,證明那些擔心是多余的。在Python C++-sig上每天的提問,和積壓的改進請求表明了庫正在被使用。對我們來說,未來是光明的。
Boost.Python achieves seamless interoperability between two rich andcomplimentary language environments. Because it leverages templatemetaprogramming to introspect about types and functions, the usernever has to learn a third syntax: the interface definitions arewritten in concise and maintainable C++. Also, the wrapping systemdoesn't have to parse C++ headers or represent the type system: thecompiler does that work for us.
Boost.Python在兩種功能豐富并且互補的語言環境間實現了無縫協作。因為它利用模板元編程對類型和函數進行內省,用戶不必去學習第三種語言:接口定義是用簡潔和可維護的C++寫的。同時,封裝系統不必解析C++頭文件或者描述類型系統:編譯器都給我們做了。
Computationally intensive tasks play to the strengths of C++ and areoften impossible to implement efficiently in pure Python, while jobslike serialization that are trivial in Python can be very difficult inpure C++. Given the luxury of building a hybrid software system fromthe ground up, we can approach design with new confidence and power.
計算密集型任務是C++的強項,一般不可能用純Python高效實現,然而像序列化這樣的工作,用Python很簡單,用純C++就非常困難。如果我們能構建完全的混合軟件系統,我們就能以新的信心和力量來進行設計。
回想起來,對NLP系統來說,從一開始就“混合地思考”可能會更好:用純Python原型定義的組件接口,對Python來說是自然的,可后來發現并不合適。當核心改寫成C++時,使用該接口無法達到期望的性能和內存占用要求,最后只好對Python這邊的某些部分重新設計,造成了額外開銷。
對于通過Python接口來驅動所有C++測試,我們也持保留態度,除非從Python調用是最終唯一的使用方式。因為兩種語言的對象模型如此不同,任何跨越語言邊界的轉換都會不可避免地掩蓋錯誤。
這些特性在Boost.Python v1里表達方式非常不同。
Powered by: C++博客 Copyright © 金慶