Programming Python, 3rd Edition 翻譯 最新版本見wiki:http://wiki.woodpecker.org.cn/moin/PP3eD 歡迎參與翻譯與修訂。
One of the more common tasks in the shell utilities domain is applying an operation to a set of files in a directorya "folder" in Windows-speak. By running a script on a batch of files, we can automate (that is, script) tasks we might have to otherwise run repeatedly by hand. 在shell應用領域,更常見的任務是,操作目錄中的一組文件,按Windows的說法是“文件夾”。通過對一批文件運行腳本,我們可以將任務自動化(即腳本化),否則我們就必須以手工方式重復運行腳本。 For instance, suppose you need to search all of your Python files in a development directory for a global variable name (perhaps you've forgotten where it is used). There are many platform-specific ways to do this (e.g., the grep command in Unix), but Python scripts that accomplish such tasks will work on every platform where Python worksWindows, Unix, Linux, Macintosh, and just about any other platform commonly used today. If you simply copy your script to any machine you wish to use it on, it will work regardless of which other tools are available there. 例如,假設你需要搜索開發目錄中所有的Python文件,以查找一個全局變量名(也許你忘了在哪兒使用過它)。有許多平臺專用的方法可以做到這一點(例如 Unix grep命令),但完成這種任務的Python腳本可以運行于所有Python可以運行的平臺:Windows、Unix、Macintosh和幾乎所有 目前常用的其他平臺。你只需將你的腳本復制到你想使用的機器,不管該機器上其他工具是否可用,腳本都可以運行。
The most common way to go about writing such tools is to first grab a list of the names of the files you wish to process, and then step through that list with a Python for loop, processing each file in turn. The trick we need to learn here, then, is how to get such a directory list within our scripts. There are at least three options: running shell listing commands with os.popen, matching filename patterns with glob.glob, and getting directory listings with os.listdir. They vary in interface, result format, and portability. 編寫這類工具最常用的方法是,先獲取你要處理的文件名列表,然后通過Python for循環遍歷該列表,依次處理每個文件。那么,這里我們需要學習的訣竅是,如何在腳本中得到這樣一個目錄列表。至少有三種方法:用os.popen運行 shell目錄列表命令、用glob.glob進行文件名模式匹配,或用os.listdir得到目錄列表。這三種方法在接口、結果格式和可移植性上各不 相同。
Quick: how did you go about getting directory file listings before you heard of Python? If you're new to shell tools programming, the answer may be "Well, I started a Windows file explorer and clicked on stuff," but I'm thinking here in terms of less GUI-oriented command-line mechanisms (and answers submitted in Perl and Tcl get only partial credit). 搶答:在你聽說Python之前,你是如何獲取目錄中的文件列表的呢?如果您不熟悉shell工具編程,答案可能是“嗯,我打開了Windows資源管理器并點擊目錄”,但我在這里要求使用非GUI的命令行機制(并且用Perl和Tcl回答都不能得到滿分)。 On Unix, directory listings are usually obtained by typing ls in a shell; on Windows, they can be generated with a dir command typed in an MS-DOS console box. Because Python scripts may use os.popen to run any command line that we can type in a shell, they are the most general way to grab a directory listing inside a Python program. We met os.popen in the prior chapter; it runs a shell command string and gives us a file object from which we can read the command's output. To illustrate, let's first assume the following directory structures (yes, I have both dir and ls commands on my Windows laptop; old habits die hard): 在Unix上,通常在shell中鍵入ls來獲得目錄列表;在Windows上,可以在MS-DOS控制臺窗口中鍵入dir命令來生成目錄列表。由于 Python腳本可以使用os.popen運行任何命令行,就像在shell中輸入一樣,這是在Python程序中獲取目錄列表的最一般的方法。我們在上 一章見過os.popen,它會運行一個shell命令字符串,并且提供一個文件對象,我們可以從該文件讀取命令的輸出。作為例子,我們先假設有以下目錄 結構(是的,我的Windows筆記本上同時有dir和ls命令,舊習難改): C:\temp>dir /B about-pp.html python1.5.tar.gz about-pp2e.html about-ppr2e.html newdir C:\temp>ls about-pp.html about-ppr2e.html python1.5.tar.gz about-pp2e.html newdir C:\temp>ls newdir more temp1 temp2 temp3 The newdir name is a nested subdirectory in C:\temp here. Now, scripts can grab a listing of file and directory names at this level by simply spawning the appropriate platform-specific command line and reading its output (the text normally thrown up on the console window): 其中newdir是C:\temp的子目錄?,F在,腳本可以在該層上抓取文件和目錄名列表了,只需運行適當的該平臺上的命令行,并讀取其輸出(正常情況下,文字會產生在控制臺窗口上): C:\temp>python >>> import os >>> os.popen('dir /B').readlines( ) ['about-pp.html\n', 'python1.5.tar.gz\n', 'about-pp2e.html\n', 'about-ppr2e.html\n', 'newdir\n'] Lines read from a shell command come back with a trailing end-of-line character, but it's easy enough to slice off with a for loop or list comprehension expression as in the following code: 從shell命令讀取的行帶有行尾符,但很容易通過for循環或者列表解析表達式用分片操作切除,如以下代碼: >>> for line in os.popen('dir /B').readlines( ): ... print line[:-1] ... about-pp.html python1.5.tar.gz about-pp2e.html about-ppr2e.html newdir >>> lines = [line[:-1] for line in os.popen('dir /B')] >>> lines ['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html', 'newdir'] One subtle thing: notice that the object returned by os.popen has an iterator that reads one line per request (i.e., per next( ) method call), just like normal files, so calling the readlines method is optional here unless you really need to extract the result list all at once (see the discussion of file iterators earlier in this chapter). For pipe objects, the effect of iterators is even more useful than simply avoiding loading the entire result into memory all at once: readlines will block the caller until the spawned program is completely finished, whereas the iterator might not. 注意一個微妙之處:os.popen返回的對象有個迭代器,每次請求時它就會讀取一行(即每次next()方法調用時),就像普通文件一樣,所以調用 readlines方法是可選的,除非你真的需要一下子提取結果列表(見本章前面文件迭代器的討論)。對于管道對象,迭代器的效果更為有用,不僅僅是避免 一下子加載整個結果到內存:readlines會阻塞調用者,直到生成的程序完全結束,而迭代器不會。 The dir and ls commands let us be specific about filename patterns to be matched and directory names to be listed; again, we're just running shell commands here, so anything you can type at a shell prompt goes: dir和ls命令可以讓我們指定文件名匹配的模式和需要列出的目錄名;再說一次,在這里我們只是運行shell命令,所以,任何只要你可以在shell提示符下鍵入的命令都可以: >>> os.popen('dir *.html /B').readlines( ) ['about-pp.html\n', 'about-pp2e.html\n', 'about-ppr2e.html\n'] >>> os.popen('ls *.html').readlines( ) ['about-pp.html\n', 'about-pp2e.html\n', 'about-ppr2e.html\n'] >>> os.popen('dir newdir /B').readlines( ) ['temp1\n', 'temp2\n', 'temp3\n', 'more\n'] >>> os.popen('ls newdir').readlines( ) ['more\n', 'temp1\n', 'temp2\n', 'temp3\n'] These calls use general tools and work as advertised. As I noted earlier, though, the downsides of os.popen are that it requires using a platform-specific shell command and it incurs a performance hit to start up an independent program. The following two alternative techniques do better on both counts. 這些調用使用了一般的工具,并且能正確工作。但是,正如我前面指出,os.popen的缺點是它需要使用特定于平臺的shell命令,并且,它需要啟動一個獨立程序而導致性能損耗。下面的兩個替代技術在這兩點上做得更好。
The term globbing comes from the * wildcard character in filename patterns; per computing folklore, a * matches a "glob" of characters. In less poetic terms, globbing simply means collecting the names of all entries in a directoryfiles and subdirectorieswhose names match a given filename pattern. In Unix shells, globbing expands filename patterns within a command line into all matching filenames before the command is ever run. In Python, we can do something similar by calling the glob.glob built-in with a pattern to expand: glob一詞來自文件名模式中的通配符*;在計算機民間傳統中,一個*匹配“glob(所有)”字符。用缺乏詩意的話說,glob僅僅意味著收集目錄中所 有符合給定文件名模式的文件名和子目錄名。在Unix shell中,命令運行前,glob會將命令行中的文件名模式擴展為所有匹配的文件名。在Python中,我們可以通過調用glob.glob做類似的事 情,參數為待擴展的模式: >>> import glob >>> glob.glob('*') ['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html', 'newdir'] >>> glob.glob('*.html') ['about-pp.html', 'about-pp2e.html', 'about-ppr2e.html'] >>> glob.glob('newdir/*') ['newdir\\temp1', 'newdir\\temp2', 'newdir\\temp3', 'newdir\\more'] The glob call accepts the usual filename pattern syntax used in shells (e.g., ? means any one character, * means any number of characters, and [] is a character selection set).[*] The pattern should include a directory path if you wish to glob in something other than the current working directory, and the module accepts either Unix or DOS-style directory separators (/ or \). Also, this call is implemented without spawning a shell command and so is likely to be faster and more portable across all Python platforms than the os.popen schemes shown earlier. glob調用接受在shell中使用的通常的文件名模式語法(例如,?表示任何一個字符,*表示任意多個字符,以及[]是字符選擇集)[*]。如果你希望 glob的東西不在當前工作目錄,模式中還應該包括目錄路徑,該模塊可以接受Unix或DOS樣式的目錄分隔符(/或\)。另外,該調用的實現中沒有產生 shell命令,因此比前面所示的os.popen方案更快,并且移植性更好,可用于所有的Python平臺。 [*] In fact, glob just uses the standard fnmatch module to match name patterns; see the fnmatch description later in this chapter for more details. [*] 事實上,glob只是利用標準的fnmatch模塊匹配名稱模式,詳見本章后面對fnmatch的描述。 Technically speaking, glob is a bit more powerful than described so far. In fact, using it to list files in one directory is just one use of its pattern-matching skills. For instance, it can also be used to collect matching names across multiple directories, simply because each level in a passed-in directory path can be a pattern too: 從技術上講,glob比迄今所描述的還強大一點。其實,用它來列出一個目錄中的文件只是其模式匹配技術的應用之一。例如,它也可以用于跨多個目錄收集匹配的名字,因為傳入的目錄路徑的每一級都可以是一個模式: C:\temp>python >>> import glob >>> for name in glob.glob('*examples/L*.py'): print name ... cpexamples\Launcher.py cpexamples\Launch_PyGadgets.py cpexamples\LaunchBrowser.py cpexamples\launchmodes.py examples\Launcher.py examples\Launch_PyGadgets.py examples\LaunchBrowser.py examples\launchmodes.py >>> for name in glob.glob(r'*\*\visitor_find*.py'): print name ... cpexamples\PyTools\visitor_find.py cpexamples\PyTools\visitor_find_quiet2.py cpexamples\PyTools\visitor_find_quiet1.py examples\PyTools\visitor_find.py examples\PyTools\visitor_find_quiet2.py examples\PyTools\visitor_find_quiet1.py In the first call here, we get back filenames from two different directories that match the *examples pattern; in the second, both of the first directory levels are wildcards, so Python collects all possible ways to reach the base filenames. Using os.popen to spawn shell commands achieves the same effect only if the underlying shell or listing command does too. 此處第一個調用中,我們從兩個不同的目錄得到了文件名,這兩個目錄都匹配模式*examples;在第二個中,前兩個目錄級別都是通配符,所以 Python查找一切可能的路徑來收集基本文件名。如果用os.popen產生shell命令要達到同樣的效果,只有在底層shell或列表命令能夠做到 時才行。
The os module's listdir call provides yet another way to collect filenames in a Python list. It takes a simple directory name string, not a filename pattern, and returns a list containing the names of all entries in that directoryboth simple files and nested directoriesfor use in the calling script: os模塊的listdir調用提供了另一方法,它會將名字收集成Python列表。它需要一個普通的目錄名字符串,而不是一個文件名模式,并且,它返回一個列表供腳本使用,其中包含該目錄中所有條目的名字,不管是簡單的文件,還是嵌套目錄: >>> os.listdir('.') ['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html', 'newdir'] >>> os.listdir(os.curdir) ['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html', 'newdir'] >>> os.listdir('newdir') ['temp1', 'temp2', 'temp3', 'more'] This too is done without resorting to shell commands and so is portable to all major Python platforms. The result is not in any particular order (but can be sorted with the list sort method), returns base filenames without their directory path prefixes, and includes names of both files and directories at the listed level. 它也沒有借助shell命令,因此可以移植到所有主要的Python平臺。它的結果沒有任何特定的順序(但可以用列表的排序方法進行排序),返回的是不帶目錄路徑前綴的基本文件名,并且同時包含所列舉目錄中的文件名和目錄名。 To compare all three listing techniques, let's run them here side by side on an explicit directory. They differ in some ways but are mostly just variations on a themeos.popen sorts names and returns end-of-lines, glob.glob accepts a pattern and returns filenames with directory prefixes, and os.listdir takes a simple directory name and returns names without directory prefixes: 為了比較這三種目錄列表技術,讓我們在特定目錄下依次運行它們。它們在某些方面有所不同,但大多只是主題不同。os.popen會排序名字,并返回行尾 符,glob.glob接受一個模式并返回帶目錄前綴的文件名,而os.listdir需要一個普通的目錄名,并返回不帶目錄前綴的名字: >>> os.popen('ls C:\PP3rdEd').readlines( ) ['README.txt\n', 'cdrom\n', 'chapters\n', 'etc\n', 'examples\n', 'examples.tar.gz\n', 'figures\n', 'shots\n'] >>> glob.glob('C:\PP3rdEd\*') ['C:\\PP3rdEd\\examples.tar.gz', 'C:\\PP3rdEd\\README.txt', 'C:\\PP3rdEd\\shots', 'C:\\PP3rdEd\\figures', 'C:\\PP3rdEd\\examples', 'C:\\PP3rdEd\\etc', 'C:\\PP3rdEd\\chapters', 'C:\\PP3rdEd\\cdrom'] >>> os.listdir('C:\PP3rdEd') ['examples.tar.gz', 'README.txt', 'shots', 'figures', 'examples', 'etc', 'chapters', 'cdrom'] Of these three, glob and listdir are generally better options if you care about script portability, and listdir seems fastest in recent Python releases (but gauge its performance yourselfimplementations may change over time). 三者之中,如果您關心腳本的可移植性,glob和listdir一般是更好的選擇,在最新的Python版本中,listdir似乎是最快的(但您需要自己衡量其表現,實現可能會隨時間變化)。
In the last example, I pointed out that glob returns names with directory paths, whereas listdir gives raw base filenames. For convenient processing, scripts often need to split glob results into base files or expand listdir results into full paths. Such translations are easy if we let the os.path module do all the work for us. For example, a script that intends to copy all files elsewhere will typically need to first split off the base filenames from glob results so that it can add different directory names on the front: 我在上例中指出,glob返回帶目錄路徑的名字,而listdir給出的是原始的基本文件名。為方便處理,腳本通常需要將glob的結果分割成基本文件 名,或將listdir的結果擴展到完整路徑。讓os.path模塊做這種轉換很容易。例如,如果腳本打算將所有文件復制到其他地方,一般需要先從 glob的結果中分割出基本文件名,這樣它才可以在前面添加不同的目錄名: >>> dirname = r'C:\PP3rdEd' >>> for file in glob.glob(dirname + '/*'): ... head, tail = os.path.split(file) ... print head, tail, '=>', ('C:\\Other\\' + tail) ... C:\PP3rdEd examples.tar.gz => C:\Other\examples.tar.gz C:\PP3rdEd README.txt => C:\Other\README.txt C:\PP3rdEd shots => C:\Other\shots C:\PP3rdEd figures => C:\Other\figures C:\PP3rdEd examples => C:\Other\examples C:\PP3rdEd etc => C:\Other\etc C:\PP3rdEd chapters => C:\Other\chapters C:\PP3rdEd cdrom => C:\Other\cdrom Here, the names after the => represent names that files might be moved to. Conversely, a script that means to process all files in a different directory than the one it runs in will probably need to prepend listdir results with the target directory name before passing filenames on to other tools: 其中,=>后面的名字代表文件移動的目的文件名。相反,如果腳本要處理其他目錄中的所有文件,而非當前它所運行的目錄,它可能需要在listdir的結果前添加目標目錄名,然后才能將文件名傳給其他工具: >>> for file in os.listdir(dirname): ... print os.path.join(dirname, file) ... C:\PP3rdEd\examples.tar.gz C:\PP3rdEd\README.txt C:\PP3rdEd\shots C:\PP3rdEd\figures C:\PP3rdEd\examples C:\PP3rdEd\etc C:\PP3rdEd\chapters C:\PP3rdEd\cdrom
As you read the prior section, you may have noticed that all of the preceding techniques return the names of files in only a single directory. What if you want to apply an operation to every file in every directory and subdirectory in an entire directory tree? 當你閱讀前一部分時,你可能已經注意到,前面的方法返回的文件名都是僅在一個目錄下的文件。如果你想要在整個目錄樹中,對每個目錄和子目錄中的所有文件操作,那該怎么辦? For instance, suppose again that we need to find every occurrence of a global name in our Python scripts. This time, though, our scripts are arranged into a module package: a directory with nested subdirectories, which may have subdirectories of their own. We could rerun our hypothetical single-directory searcher manually in every directory in the tree, but that's tedious, error prone, and just plain not fun. 例如,再次假設我們需要在多個Python腳本中查找一個全局變量名的所有使用。不過這一次,我們的腳本被編排成了模塊封裝包:一個包含嵌套子目錄的目 錄,子目錄可能有它們自己的子目錄。我們可以在目錄樹中的每個目錄下,手工重復運行我們假想的單目錄搜索器,但這很乏味,容易出錯,一點也不好玩。 Luckily, in Python it's almost as easy to process a directory tree as it is to inspect a single directory. We can either write a recursive routine to traverse the tree, or use one of two tree-walker utilities built into the os module. Such tools can be used to search, copy, compare, and otherwise process arbitrary directory trees on any platform that Python runs on (and that's just about everywhere). 幸運的是,在Python中,處理目錄樹幾乎和檢查單個目錄一樣容易。我們既可以編寫遞歸程序來遍歷樹,也可以使用os模塊內置的兩種樹遍歷工具。這些工具可對任意目錄樹進行檢索、復制、比較,和其他處理,并且是在任何Python可以運行的平臺上(那幾乎就是到處)。
To make it easy to apply an operation to all files in a tree hierarchy, Python comes with a utility that scans trees for us and runs a provided function at every directory along the way. The os.path.walk function is called with a directory root, function object, and optional data item, and walks the tree at the directory root and below. At each directory, the function object passed in is called with the optional data item, the name of the current directory, and a list of filenames in that directory (obtained from os.listdir). Typically, the function we provide (often referred to as a callback function) scans the filenames list to process files at each directory level in the tree. 為了方便對目錄樹層次結構中的所有文件應用一個操作,Python提供了一種實用工具,它會掃描目錄樹,并沿途在每個目錄中運行我們所提供的函數。該 os.path.walk函數被調用時需要指定目錄的根、一個函數對象和可選的數據項,它將遍歷根目錄及以下的目錄樹。在每一個目錄,傳入的函數對象會被 調用,參數是可選的數據項、當前目錄的名稱,以及該目錄的列表(從os.listdir獲得)。典型情況下,我們提供的函數(通常稱為回調函數)將掃描文 件列表,以處理樹上每個目錄級別下的文件。 That description might sound horribly complex the first time you hear it, but os.path.walk is fairly straightforward once you get the hang of it. In the following code, for example, the lister function is called from os.path.walk at each directory in the tree rooted at .. Along the way, lister simply prints the directory name and all the files at the current level (after prepending the directory name). It's simpler in Python than in English: 這樣的描述第一次聽起來可能非常復雜,但只要你掌握它的決竅,os.path.walk其實相當簡單。例如,以下代碼中,在以.為根的目錄樹 中,os.path.walk會在每個目錄下調用lister函數。一路上,lister簡單地打印當前層次的目錄名和所有文件(在前面加上目錄名)。用 Python表達比用英語更簡單: >>> import os >>> def lister(dummy, dirname, filesindir): ... print '[' + dirname + ']' ... for fname in filesindir: ... print os.path.join(dirname, fname) # handle one file ... >>> os.path.walk('.', lister, None) [.] .\about-pp.html .\python1.5.tar.gz .\about-pp2e.html .\about-ppr2e.html .\newdir [.\newdir] .\newdir\temp1 .\newdir\temp2 .\newdir\temp3 .\newdir\more [.\newdir\more] .\newdir\more\xxx.txt .\newdir\more\yyy.txt In other words, we've coded our own custom (and easily changed) recursive directory listing tool in Python. Because this may be something we would like to tweak and reuse elsewhere, let's make it permanently available in a module file, as shown in Example 4-4, now that we've worked out the details interactively. 換句話說,我們用Python編寫了我們自己的自定義(并且容易更改的)遞歸目錄列表工具。因為我們可能會在其他地方調整和重用這段代碼,既然我們已經以交互方式完成了細節,就讓我們把它寫入模塊文件,讓它永久可用,如示例4-4所示。 Example 4-4. PP3E\System\Filetools\lister_walk.py # list file tree with os.path.walk import sys, os def lister(dummy, dirName, filesInDir): # called at each dir print '[' + dirName + ']' for fname in filesInDir: # includes subdir names path = os.path.join(dirName, fname) # add dir name prefix if not os.path.isdir(path): # print simple files only print path if _ _name_ _ == '_ _main_ _': os.path.walk(sys.argv[1], lister, None) # dir name in cmdline This is the same code except that directory names are filtered out of the filenames list by consulting the os.path.isdir test in order to avoid listing them twice (see, it's been tweaked already). When packaged this way, the code can also be run from a shell command line. Here it is being launched from a different directory, with the directory to be listed passed in as a command-line argument: 代碼幾乎相同,除了文件名用os.path.isdir進行測試,以過濾掉列表中的目錄名,這是為了避免把它們列舉兩次(看,它已經進行了調整)。這樣包裝之后,代碼也可以從shell命令行運行了。此處,它從不同的目錄啟動,而待列舉的目錄是通過命令行參數傳入的: C:\...\PP3E\System\Filetools>python lister_walk.py C:\Temp [C:\Temp] C:\Temp\about-pp.html C:\Temp\python1.5.tar.gz C:\Temp\about-pp2e.html C:\Temp\about-ppr2e.html [C:\Temp\newdir] C:\Temp\newdir\temp1 C:\Temp\newdir\temp2 C:\Temp\newdir\temp3 [C:\Temp\newdir\more] C:\Temp\newdir\more\xxx.txt C:\Temp\newdir\more\yyy.txt The walk paradigm also allows functions to tailor the set of directories visited by changing the file list argument in place. The library manual documents this further, but it's probably more instructive to simply know what walk truly looks like. Here is its actual Python-coded implementation for Windows platforms (at the time of this writing), with comments added to help demystify its operation: 該遍歷模式還允許函數就地更改文件列表參數,來裁剪進行訪問的目錄集。庫手冊對此有更多的說明,但了解walk的真正樣子可能更有益。下面是其Windows平臺實際的Python實現(在撰寫本文時),附加了注釋以幫助解開其神秘性: def walk(top, func, arg): # top is the current dirname try: names = os.listdir(top) # get all file/dir names here except os.error: # they have no path prefix return func(arg, top, names) # run func with names list here exceptions = ('.', '..') for name in names: # step over the very same list if name not in exceptions: # but skip self/parent names name = join(top, name) # add path prefix to name if isdir(name): walk(name, func, arg) # descend into subdirs here Notice that walk generates filename lists at each level with os.listdir, a call that collects both file and directory names in no particular order and returns them without their directory paths. Also note that walk uses the very same list returned by os.listdir and passed to the function you provide in order to later descend into subdirectories (variable names). Because lists are mutable objects that can be changed in place, if your function modifies the passed-in filenames list, it will impact what walk does next. For example, deleting directory names will prune traversal branches, and sorting the list will order the walk. 請注意,walk用os.listdir生成每一層的文件名列表,而os.listdir調用會同時收集文件名和目錄名,名字無任何特定的順序,并且返回 結果中不包含它們的目錄路徑。另外請注意,walk將os.listdir返回的列表傳入你所提供的函數,然后又用該同一列表下降進入各個子目錄(即變量 names)。由于列表是可變對象,可以就地更改,如果你的函數修改了傳入的文件名列表,就會影響walk的下一步動作。例如,刪除目錄名會修剪遍歷的分 支,而排序該列表會調整walk的順序。
In recent Python releases, a new directory tree walker has been added which does not require a callback function to be coded. This new call, os.walk, is instead a generator function; when used within a for loop, each time through it yields a tuple containing the current directory name, a list of subdirectories in that directory, and a list of nondirectory files in that directory. 在最新的Python版本中,增加了一個新的目錄樹遍歷函數,它不需要編寫回調函數。這個全新的調用,os.walk,是一個生成器函數,當它在for循環內使用時,它每次會產生一個元組,其中包含當前目錄名、該目錄的子目錄列表,及該目錄的非目錄文件列表。 Recall that generators have a .next( ) method implicitly invoked by for loops and other iteration contexts; each call forces the walker to the next directory in the tree. Essentially, os.walk replaces the os.path.walk callback function with a loop body, and so it may be easier to use (though you'll have to judge that for yourself). 回想一下,生成器有個.next()方法,在for循環和其他迭代情況下,該方法會被隱式地調用;每次調用會迫使遍歷函數進入樹上的下一個目錄。從本質上 講,os.walk用循環替換了os.path.walk的回調函數,所以它可能會更好用(但你必須自己判斷是否好用)。 For example, suppose you have a directory tree of files and you want to find all Python source files within it that reference the Tkinter GUI module. The traditional way to accomplish this with os.path.walk requires a callback function run at each level of the tree: 例如,假設你有個文件目錄樹,你想搜索其中所有的Python源文件,查找對Tkinter GUI模塊的引用。用os.path.walk來完成的傳統方法需要一個回調函數,os.path.walk會在樹的各個層次運行該函數: >>> import os >>> def atEachDir(matchlist, dirname, fileshere): for filename in fileshere: if filename.endswith('.py'): pathname = os.path.join(dirname, filename) if 'Tkinter' in open(pathname).read( ): matchlist.append(pathname) >>> matches = [] >>> os.path.walk(r'D:\PP3E', atEachDir, matches) >>> matches ['D:\\PP3E\\dev\\examples\\PP3E\\Preview\\peoplegui.py', 'D:\\PP3E\\dev\\ examples\\PP3E\\Preview\\tkinter101.py', 'D:\\PP3E\\dev\\examples\\PP3E\\ Preview\\tkinter001.py', 'D:\\PP3E\\dev\\examples\\PP3E\\Preview\\ peoplegui_class.py', 'D:\\PP3E\\dev\\examples\\PP3E\\Preview\\ tkinter102.py', 'D:\\PP3E\\NewExamples\\clock.py', 'D:\\PP3E\\NewExamples \\calculator.py'] This code loops through all the files at each level, looking for files with .py at the end of their names and which contain the search string. When a match is found, its full name is appended to the results list object, which is passed in as an argument (we could also just build a list of .py files and search each in a for loop after the walk). The equivalent os.walk code is similar, but the callback function's code becomes the body of a for loop, and directory names are filtered out for us: 這段代碼循環遍歷每一級的文件,尋找名字以.py結尾,并且包含搜索字符串的文件。當找到一個匹配,其全稱會附加到結果列表對象,該列表對象是作為參數傳 入的(我們也可以只建立一個.py文件列表,然后在walk之后用for循環搜索)。等效的os.walk代碼與此相似,但回調函數的代碼變成了循環體, 并且目錄名已為我們過濾掉了: >>> import os >>> matches = [] >>> for (dirname, dirshere, fileshere) in os.walk(r'D:\PP3E'): for filename in fileshere: if filename.endswith('.py'): pathname = os.path.join(dirname, filename) if 'Tkinter' in open(pathname).read( ): matches.append(pathname) >>> matches ['D:\\PP3E\\dev\\examples\\PP3E\\Preview\\peoplegui.py', 'D:\\PP3E\\dev\\examples\\ PP3E\\Preview\\tkinter101.py', 'D:\\PP3E\\dev\\examples\\PP3E\\Preview\\ tkinter001.py', 'D:\\PP3E\\dev\\examples\\PP3E\\Preview\\peoplegui_class.py', 'D:\\ PP3E\\dev\\examples\\PP3E\\Preview\\tkinter102.py', 'D:\\PP3E\\NewExamples\\ clock.py', 'D:\\PP3E\\NewExamples\\calculator.py'] If you want to see what's really going on in the os.walk generator, call its next( ) method manually a few times as the for loop does automatically; each time, you advance to the next subdirectory in the tree: 如果你想看看os.walk生成器實際是如何運作的,可以手動調用幾次它的next()方法,來模擬for循環中的自動調用;每一次,你會前進到樹中的下一個子目錄: >>> gen = os.walk('D:\PP3E') >>> gen.next( ) ('D:\\PP3E', ['proposal', 'dev', 'NewExamples', 'bkp'], ['prg-python-2.zip']) >>> gen.next( ) ('D:\\PP3E\\proposal', [], ['proposal-programming-python-3e.doc']) >>> gen.next( ) ('D:\\PP3E\\dev', ['examples'], ['ch05.doc', 'ch06.doc', 'ch07.doc', 'ch08.doc', 'ch09.doc', 'ch10.doc', 'ch11.doc', 'ch12.doc', 'ch13.doc', 'ch14.doc', ...more... The os.walk generator has more features than I will demonstrate here. For instance, additional arguments allow you to specify a top-down or bottom-up traversal of the directory tree, and the list of subdirectories in the yielded tuple can be modified in-place to change the traversal in top-down mode, much as for os.path.walk. See the Python library manual for more details. os.walk生成器有許多功能我沒有在此展示。例如,附加參數允許你指定自上而下還是自下而上遍歷目錄樹,以及在自上而下的模式中,生成的元組中的子目錄列表可以就地修改來更改遍歷,就像os.path.walk中的一樣。詳情請參閱Python庫手冊。 So why the new call? Is the new os.walk easier to use than the traditional os.path.walk? Perhaps, if you need to distinguish between subdirectories and files in each directory (os.walk gives us two lists rather than one) or can make use of a bottom-up traversal or other features. Otherwise, it's mostly just the trade of a function for a for loop header. You'll have to judge for yourself whether this is more natural or not; we'll use both forms in this book. 那么,為什么要有這個新的調用呢?是新的os.walk比傳統的os.path.walk更好用?如果您需要區分每個目錄中的子目錄和文件 (os.walk為我們提供了兩個列表,而不是一個),或者想利用自下而上的遍歷或其他功能,也許os.walk是更好用。否則,os.walk幾乎僅僅 是把一個函數替換為for循環頭。你必須自己去判斷這是否更自然;在本書中,這兩種形式我們都會使用。
The os.path.walk and os.walk tools do tree traversals for us, but it's sometimes more flexible and hardly any more work to do it ourselves. The following script recodes the directory listing script with a manual recursive traversal function (a function that calls itself to repeat its actions). The mylister function in Example 4-5 is almost the same as lister in Example 4-4 but calls os.listdir to generate file paths manually and calls itself recursively to descend into subdirectories. os.path.walk和os.walk工具可以為我們做樹遍歷,但有時,我們自己遍歷會更靈活,并且幾乎無須做太多工作。以下腳本用一個手動遞歸遍歷 函數重寫了目錄列表腳本(遞歸函數就是它會調用自身做重復的動作)。示例4-5中的mylister函數與示例4-4的lister幾乎相同,但它調用 os.listdir來手動產生文件路徑,并遞歸調用自己進入子目錄。 Example 4-5. PP3E\System\Filetools\lister_recur.py # list files in dir tree by recursion import sys, os def mylister(currdir): print '[' + currdir + ']' for file in os.listdir(currdir): # list files here path = os.path.join(currdir, file) # add dir path back if not os.path.isdir(path): print path else: mylister(path) # recur into subdirs if _ _name_ _ == '_ _main_ _': mylister(sys.argv[1]) # dir name in cmdline This version is packaged as a script too (this is definitely too much code to type at the interactive prompt); its output is identical when run as a script: 此版本也被打包為腳本(在交互式提示符下敲代碼,這無疑是太多了);作為腳本運行時,其輸出是相同的: C:\...\PP3E\System\Filetools>python lister_recur.py C:\Temp [C:\Temp] C:\Temp\about-pp.html C:\Temp\python1.5.tar.gz C:\Temp\about-pp2e.html C:\Temp\about-ppr2e.html [C:\Temp\newdir] C:\Temp\newdir\temp1 C:\Temp\newdir\temp2 C:\Temp\newdir\temp3 [C:\Temp\newdir\more] C:\Temp\newdir\more\xxx.txt C:\Temp\newdir\more\yyy.txt But this file is just as useful when imported and called elsewhere: 但是該文件可以在其他地方被導入并調用: C:\temp>python >>> from PP3E.System.Filetools.lister_recur import mylister >>> mylister('.') [.] .\about-pp.html .\python1.5.tar.gz .\about-pp2e.html .\about-ppr2e.html [.\newdir] .\newdir\temp1 .\newdir\temp2 .\newdir\temp3 [.\newdir\more] .\newdir\more\xxx.txt .\newdir\more\yyy.txt We will make better use of most of this section's techniques in later examples in Chapter 7 and in this book at large. For example, scripts for copying and comparing directory trees use the tree-walker techniques listed previously. Watch for these tools in action along the way. If you are interested in directory processing, also see the discussion of Python's old grep module in Chapter 7; it searches files and can be applied to all files in a directory when combined with the glob module, but it simply prints results and does not traverse directory trees by itself. 在本書及后面第7章的例子中,我們將好好地利用本節的大部分技術。例如,復制和比較目錄樹的腳本會使用前面列出的樹遍歷技術。請一路上注意這些實用工具。 如果你對目錄處理有興趣,也請看看第7章對Python舊的grep模塊的討論;grep會搜索文件,并且與glob模塊組合時,可以應用于目錄中的所有 文件,但它本身只是打印結果,并不遍歷目錄樹。
Powered by: C++博客 Copyright © 金慶