Programming Python, 3rd Edition 翻譯 最新版本見(jiàn)wiki:http://wiki.woodpecker.org.cn/moin/PP3eD 歡迎參與翻譯與修訂。
One of the more common tasks in the shell utilities domain is applying an operation to a set of files in a directorya "folder" in Windows-speak. By running a script on a batch of files, we can automate (that is, script) tasks we might have to otherwise run repeatedly by hand. 在shell應(yīng)用領(lǐng)域,更常見(jiàn)的任務(wù)是,操作目錄中的一組文件,按Windows的說(shuō)法是“文件夾”。通過(guò)對(duì)一批文件運(yùn)行腳本,我們可以將任務(wù)自動(dòng)化(即腳本化),否則我們就必須以手工方式重復(fù)運(yùn)行腳本。 For instance, suppose you need to search all of your Python files in a development directory for a global variable name (perhaps you've forgotten where it is used). There are many platform-specific ways to do this (e.g., the grep command in Unix), but Python scripts that accomplish such tasks will work on every platform where Python worksWindows, Unix, Linux, Macintosh, and just about any other platform commonly used today. If you simply copy your script to any machine you wish to use it on, it will work regardless of which other tools are available there. 例如,假設(shè)你需要搜索開(kāi)發(fā)目錄中所有的Python文件,以查找一個(gè)全局變量名(也許你忘了在哪兒使用過(guò)它)。有許多平臺(tái)專(zhuān)用的方法可以做到這一點(diǎn)(例如 Unix grep命令),但完成這種任務(wù)的Python腳本可以運(yùn)行于所有Python可以運(yùn)行的平臺(tái):Windows、Unix、Macintosh和幾乎所有 目前常用的其他平臺(tái)。你只需將你的腳本復(fù)制到你想使用的機(jī)器,不管該機(jī)器上其他工具是否可用,腳本都可以運(yùn)行。
The most common way to go about writing such tools is to first grab a list of the names of the files you wish to process, and then step through that list with a Python for loop, processing each file in turn. The trick we need to learn here, then, is how to get such a directory list within our scripts. There are at least three options: running shell listing commands with os.popen, matching filename patterns with glob.glob, and getting directory listings with os.listdir. They vary in interface, result format, and portability. 編寫(xiě)這類(lèi)工具最常用的方法是,先獲取你要處理的文件名列表,然后通過(guò)Python for循環(huán)遍歷該列表,依次處理每個(gè)文件。那么,這里我們需要學(xué)習(xí)的訣竅是,如何在腳本中得到這樣一個(gè)目錄列表。至少有三種方法:用os.popen運(yùn)行 shell目錄列表命令、用glob.glob進(jìn)行文件名模式匹配,或用os.listdir得到目錄列表。這三種方法在接口、結(jié)果格式和可移植性上各不 相同。
Quick: how did you go about getting directory file listings before you heard of Python? If you're new to shell tools programming, the answer may be "Well, I started a Windows file explorer and clicked on stuff," but I'm thinking here in terms of less GUI-oriented command-line mechanisms (and answers submitted in Perl and Tcl get only partial credit). 搶答:在你聽(tīng)說(shuō)Python之前,你是如何獲取目錄中的文件列表的呢?如果您不熟悉shell工具編程,答案可能是“嗯,我打開(kāi)了Windows資源管理器并點(diǎn)擊目錄”,但我在這里要求使用非GUI的命令行機(jī)制(并且用Perl和Tcl回答都不能得到滿分)。 On Unix, directory listings are usually obtained by typing ls in a shell; on Windows, they can be generated with a dir command typed in an MS-DOS console box. Because Python scripts may use os.popen to run any command line that we can type in a shell, they are the most general way to grab a directory listing inside a Python program. We met os.popen in the prior chapter; it runs a shell command string and gives us a file object from which we can read the command's output. To illustrate, let's first assume the following directory structures (yes, I have both dir and ls commands on my Windows laptop; old habits die hard): 在Unix上,通常在shell中鍵入ls來(lái)獲得目錄列表;在Windows上,可以在MS-DOS控制臺(tái)窗口中鍵入dir命令來(lái)生成目錄列表。由于 Python腳本可以使用os.popen運(yùn)行任何命令行,就像在shell中輸入一樣,這是在Python程序中獲取目錄列表的最一般的方法。我們?cè)谏? 一章見(jiàn)過(guò)os.popen,它會(huì)運(yùn)行一個(gè)shell命令字符串,并且提供一個(gè)文件對(duì)象,我們可以從該文件讀取命令的輸出。作為例子,我們先假設(shè)有以下目錄 結(jié)構(gòu)(是的,我的Windows筆記本上同時(shí)有dir和ls命令,舊習(xí)難改): C:\temp>dir /B about-pp.html python1.5.tar.gz about-pp2e.html about-ppr2e.html newdir C:\temp>ls about-pp.html about-ppr2e.html python1.5.tar.gz about-pp2e.html newdir C:\temp>ls newdir more temp1 temp2 temp3 The newdir name is a nested subdirectory in C:\temp here. Now, scripts can grab a listing of file and directory names at this level by simply spawning the appropriate platform-specific command line and reading its output (the text normally thrown up on the console window): 其中newdir是C:\temp的子目錄。現(xiàn)在,腳本可以在該層上抓取文件和目錄名列表了,只需運(yùn)行適當(dāng)?shù)脑撈脚_(tái)上的命令行,并讀取其輸出(正常情況下,文字會(huì)產(chǎn)生在控制臺(tái)窗口上): C:\temp>python >>> import os >>> os.popen('dir /B').readlines( ) ['about-pp.html\n', 'python1.5.tar.gz\n', 'about-pp2e.html\n', 'about-ppr2e.html\n', 'newdir\n'] Lines read from a shell command come back with a trailing end-of-line character, but it's easy enough to slice off with a for loop or list comprehension expression as in the following code: 從shell命令讀取的行帶有行尾符,但很容易通過(guò)for循環(huán)或者列表解析表達(dá)式用分片操作切除,如以下代碼: >>> for line in os.popen('dir /B').readlines( ): ... print line[:-1] ... about-pp.html python1.5.tar.gz about-pp2e.html about-ppr2e.html newdir >>> lines = [line[:-1] for line in os.popen('dir /B')] >>> lines ['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html', 'newdir'] One subtle thing: notice that the object returned by os.popen has an iterator that reads one line per request (i.e., per next( ) method call), just like normal files, so calling the readlines method is optional here unless you really need to extract the result list all at once (see the discussion of file iterators earlier in this chapter). For pipe objects, the effect of iterators is even more useful than simply avoiding loading the entire result into memory all at once: readlines will block the caller until the spawned program is completely finished, whereas the iterator might not. 注意一個(gè)微妙之處:os.popen返回的對(duì)象有個(gè)迭代器,每次請(qǐng)求時(shí)它就會(huì)讀取一行(即每次next()方法調(diào)用時(shí)),就像普通文件一樣,所以調(diào)用 readlines方法是可選的,除非你真的需要一下子提取結(jié)果列表(見(jiàn)本章前面文件迭代器的討論)。對(duì)于管道對(duì)象,迭代器的效果更為有用,不僅僅是避免 一下子加載整個(gè)結(jié)果到內(nèi)存:readlines會(huì)阻塞調(diào)用者,直到生成的程序完全結(jié)束,而迭代器不會(huì)。 The dir and ls commands let us be specific about filename patterns to be matched and directory names to be listed; again, we're just running shell commands here, so anything you can type at a shell prompt goes: dir和ls命令可以讓我們指定文件名匹配的模式和需要列出的目錄名;再說(shuō)一次,在這里我們只是運(yùn)行shell命令,所以,任何只要你可以在shell提示符下鍵入的命令都可以: >>> os.popen('dir *.html /B').readlines( ) ['about-pp.html\n', 'about-pp2e.html\n', 'about-ppr2e.html\n'] >>> os.popen('ls *.html').readlines( ) ['about-pp.html\n', 'about-pp2e.html\n', 'about-ppr2e.html\n'] >>> os.popen('dir newdir /B').readlines( ) ['temp1\n', 'temp2\n', 'temp3\n', 'more\n'] >>> os.popen('ls newdir').readlines( ) ['more\n', 'temp1\n', 'temp2\n', 'temp3\n'] These calls use general tools and work as advertised. As I noted earlier, though, the downsides of os.popen are that it requires using a platform-specific shell command and it incurs a performance hit to start up an independent program. The following two alternative techniques do better on both counts. 這些調(diào)用使用了一般的工具,并且能正確工作。但是,正如我前面指出,os.popen的缺點(diǎn)是它需要使用特定于平臺(tái)的shell命令,并且,它需要啟動(dòng)一個(gè)獨(dú)立程序而導(dǎo)致性能損耗。下面的兩個(gè)替代技術(shù)在這兩點(diǎn)上做得更好。
The term globbing comes from the * wildcard character in filename patterns; per computing folklore, a * matches a "glob" of characters. In less poetic terms, globbing simply means collecting the names of all entries in a directoryfiles and subdirectorieswhose names match a given filename pattern. In Unix shells, globbing expands filename patterns within a command line into all matching filenames before the command is ever run. In Python, we can do something similar by calling the glob.glob built-in with a pattern to expand: glob一詞來(lái)自文件名模式中的通配符*;在計(jì)算機(jī)民間傳統(tǒng)中,一個(gè)*匹配“glob(所有)”字符。用缺乏詩(shī)意的話說(shuō),glob僅僅意味著收集目錄中所 有符合給定文件名模式的文件名和子目錄名。在Unix shell中,命令運(yùn)行前,glob會(huì)將命令行中的文件名模式擴(kuò)展為所有匹配的文件名。在Python中,我們可以通過(guò)調(diào)用glob.glob做類(lèi)似的事 情,參數(shù)為待擴(kuò)展的模式: >>> import glob >>> glob.glob('*') ['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html', 'newdir'] >>> glob.glob('*.html') ['about-pp.html', 'about-pp2e.html', 'about-ppr2e.html'] >>> glob.glob('newdir/*') ['newdir\\temp1', 'newdir\\temp2', 'newdir\\temp3', 'newdir\\more'] The glob call accepts the usual filename pattern syntax used in shells (e.g., ? means any one character, * means any number of characters, and [] is a character selection set).[*] The pattern should include a directory path if you wish to glob in something other than the current working directory, and the module accepts either Unix or DOS-style directory separators (/ or \). Also, this call is implemented without spawning a shell command and so is likely to be faster and more portable across all Python platforms than the os.popen schemes shown earlier. glob調(diào)用接受在shell中使用的通常的文件名模式語(yǔ)法(例如,?表示任何一個(gè)字符,*表示任意多個(gè)字符,以及[]是字符選擇集)[*]。如果你希望 glob的東西不在當(dāng)前工作目錄,模式中還應(yīng)該包括目錄路徑,該模塊可以接受Unix或DOS樣式的目錄分隔符(/或\)。另外,該調(diào)用的實(shí)現(xiàn)中沒(méi)有產(chǎn)生 shell命令,因此比前面所示的os.popen方案更快,并且移植性更好,可用于所有的Python平臺(tái)。 [*] In fact, glob just uses the standard fnmatch module to match name patterns; see the fnmatch description later in this chapter for more details. [*] 事實(shí)上,glob只是利用標(biāo)準(zhǔn)的fnmatch模塊匹配名稱(chēng)模式,詳見(jiàn)本章后面對(duì)fnmatch的描述。 Technically speaking, glob is a bit more powerful than described so far. In fact, using it to list files in one directory is just one use of its pattern-matching skills. For instance, it can also be used to collect matching names across multiple directories, simply because each level in a passed-in directory path can be a pattern too: 從技術(shù)上講,glob比迄今所描述的還強(qiáng)大一點(diǎn)。其實(shí),用它來(lái)列出一個(gè)目錄中的文件只是其模式匹配技術(shù)的應(yīng)用之一。例如,它也可以用于跨多個(gè)目錄收集匹配的名字,因?yàn)閭魅氲哪夸浡窂降拿恳患?jí)都可以是一個(gè)模式: C:\temp>python >>> import glob >>> for name in glob.glob('*examples/L*.py'): print name ... cpexamples\Launcher.py cpexamples\Launch_PyGadgets.py cpexamples\LaunchBrowser.py cpexamples\launchmodes.py examples\Launcher.py examples\Launch_PyGadgets.py examples\LaunchBrowser.py examples\launchmodes.py >>> for name in glob.glob(r'*\*\visitor_find*.py'): print name ... cpexamples\PyTools\visitor_find.py cpexamples\PyTools\visitor_find_quiet2.py cpexamples\PyTools\visitor_find_quiet1.py examples\PyTools\visitor_find.py examples\PyTools\visitor_find_quiet2.py examples\PyTools\visitor_find_quiet1.py In the first call here, we get back filenames from two different directories that match the *examples pattern; in the second, both of the first directory levels are wildcards, so Python collects all possible ways to reach the base filenames. Using os.popen to spawn shell commands achieves the same effect only if the underlying shell or listing command does too. 此處第一個(gè)調(diào)用中,我們從兩個(gè)不同的目錄得到了文件名,這兩個(gè)目錄都匹配模式*examples;在第二個(gè)中,前兩個(gè)目錄級(jí)別都是通配符,所以 Python查找一切可能的路徑來(lái)收集基本文件名。如果用os.popen產(chǎn)生shell命令要達(dá)到同樣的效果,只有在底層shell或列表命令能夠做到 時(shí)才行。
The os module's listdir call provides yet another way to collect filenames in a Python list. It takes a simple directory name string, not a filename pattern, and returns a list containing the names of all entries in that directoryboth simple files and nested directoriesfor use in the calling script: os模塊的listdir調(diào)用提供了另一方法,它會(huì)將名字收集成Python列表。它需要一個(gè)普通的目錄名字符串,而不是一個(gè)文件名模式,并且,它返回一個(gè)列表供腳本使用,其中包含該目錄中所有條目的名字,不管是簡(jiǎn)單的文件,還是嵌套目錄: >>> os.listdir('.') ['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html', 'newdir'] >>> os.listdir(os.curdir) ['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html', 'newdir'] >>> os.listdir('newdir') ['temp1', 'temp2', 'temp3', 'more'] This too is done without resorting to shell commands and so is portable to all major Python platforms. The result is not in any particular order (but can be sorted with the list sort method), returns base filenames without their directory path prefixes, and includes names of both files and directories at the listed level. 它也沒(méi)有借助shell命令,因此可以移植到所有主要的Python平臺(tái)。它的結(jié)果沒(méi)有任何特定的順序(但可以用列表的排序方法進(jìn)行排序),返回的是不帶目錄路徑前綴的基本文件名,并且同時(shí)包含所列舉目錄中的文件名和目錄名。 To compare all three listing techniques, let's run them here side by side on an explicit directory. They differ in some ways but are mostly just variations on a themeos.popen sorts names and returns end-of-lines, glob.glob accepts a pattern and returns filenames with directory prefixes, and os.listdir takes a simple directory name and returns names without directory prefixes: 為了比較這三種目錄列表技術(shù),讓我們?cè)谔囟夸浵乱来芜\(yùn)行它們。它們?cè)谀承┓矫嬗兴煌?,但大多只是主題不同。os.popen會(huì)排序名字,并返回行尾 符,glob.glob接受一個(gè)模式并返回帶目錄前綴的文件名,而os.listdir需要一個(gè)普通的目錄名,并返回不帶目錄前綴的名字: >>> os.popen('ls C:\PP3rdEd').readlines( ) ['README.txt\n', 'cdrom\n', 'chapters\n', 'etc\n', 'examples\n', 'examples.tar.gz\n', 'figures\n', 'shots\n'] >>> glob.glob('C:\PP3rdEd\*') ['C:\\PP3rdEd\\examples.tar.gz', 'C:\\PP3rdEd\\README.txt', 'C:\\PP3rdEd\\shots', 'C:\\PP3rdEd\\figures', 'C:\\PP3rdEd\\examples', 'C:\\PP3rdEd\\etc', 'C:\\PP3rdEd\\chapters', 'C:\\PP3rdEd\\cdrom'] >>> os.listdir('C:\PP3rdEd') ['examples.tar.gz', 'README.txt', 'shots', 'figures', 'examples', 'etc', 'chapters', 'cdrom'] Of these three, glob and listdir are generally better options if you care about script portability, and listdir seems fastest in recent Python releases (but gauge its performance yourselfimplementations may change over time). 三者之中,如果您關(guān)心腳本的可移植性,glob和listdir一般是更好的選擇,在最新的Python版本中,listdir似乎是最快的(但您需要自己衡量其表現(xiàn),實(shí)現(xiàn)可能會(huì)隨時(shí)間變化)。
In the last example, I pointed out that glob returns names with directory paths, whereas listdir gives raw base filenames. For convenient processing, scripts often need to split glob results into base files or expand listdir results into full paths. Such translations are easy if we let the os.path module do all the work for us. For example, a script that intends to copy all files elsewhere will typically need to first split off the base filenames from glob results so that it can add different directory names on the front: 我在上例中指出,glob返回帶目錄路徑的名字,而listdir給出的是原始的基本文件名。為方便處理,腳本通常需要將glob的結(jié)果分割成基本文件 名,或?qū)istdir的結(jié)果擴(kuò)展到完整路徑。讓os.path模塊做這種轉(zhuǎn)換很容易。例如,如果腳本打算將所有文件復(fù)制到其他地方,一般需要先從 glob的結(jié)果中分割出基本文件名,這樣它才可以在前面添加不同的目錄名: >>> dirname = r'C:\PP3rdEd' >>> for file in glob.glob(dirname + '/*'): ... head, tail = os.path.split(file) ... print head, tail, '=>', ('C:\\Other\\' + tail) ... C:\PP3rdEd examples.tar.gz => C:\Other\examples.tar.gz C:\PP3rdEd README.txt => C:\Other\README.txt C:\PP3rdEd shots => C:\Other\shots C:\PP3rdEd figures => C:\Other\figures C:\PP3rdEd examples => C:\Other\examples C:\PP3rdEd etc => C:\Other\etc C:\PP3rdEd chapters => C:\Other\chapters C:\PP3rdEd cdrom => C:\Other\cdrom Here, the names after the => represent names that files might be moved to. Conversely, a script that means to process all files in a different directory than the one it runs in will probably need to prepend listdir results with the target directory name before passing filenames on to other tools: 其中,=>后面的名字代表文件移動(dòng)的目的文件名。相反,如果腳本要處理其他目錄中的所有文件,而非當(dāng)前它所運(yùn)行的目錄,它可能需要在listdir的結(jié)果前添加目標(biāo)目錄名,然后才能將文件名傳給其他工具: >>> for file in os.listdir(dirname): ... print os.path.join(dirname, file) ... C:\PP3rdEd\examples.tar.gz C:\PP3rdEd\README.txt C:\PP3rdEd\shots C:\PP3rdEd\figures C:\PP3rdEd\examples C:\PP3rdEd\etc C:\PP3rdEd\chapters C:\PP3rdEd\cdrom
As you read the prior section, you may have noticed that all of the preceding techniques return the names of files in only a single directory. What if you want to apply an operation to every file in every directory and subdirectory in an entire directory tree? 當(dāng)你閱讀前一部分時(shí),你可能已經(jīng)注意到,前面的方法返回的文件名都是僅在一個(gè)目錄下的文件。如果你想要在整個(gè)目錄樹(shù)中,對(duì)每個(gè)目錄和子目錄中的所有文件操作,那該怎么辦? For instance, suppose again that we need to find every occurrence of a global name in our Python scripts. This time, though, our scripts are arranged into a module package: a directory with nested subdirectories, which may have subdirectories of their own. We could rerun our hypothetical single-directory searcher manually in every directory in the tree, but that's tedious, error prone, and just plain not fun. 例如,再次假設(shè)我們需要在多個(gè)Python腳本中查找一個(gè)全局變量名的所有使用。不過(guò)這一次,我們的腳本被編排成了模塊封裝包:一個(gè)包含嵌套子目錄的目 錄,子目錄可能有它們自己的子目錄。我們可以在目錄樹(shù)中的每個(gè)目錄下,手工重復(fù)運(yùn)行我們假想的單目錄搜索器,但這很乏味,容易出錯(cuò),一點(diǎn)也不好玩。 Luckily, in Python it's almost as easy to process a directory tree as it is to inspect a single directory. We can either write a recursive routine to traverse the tree, or use one of two tree-walker utilities built into the os module. Such tools can be used to search, copy, compare, and otherwise process arbitrary directory trees on any platform that Python runs on (and that's just about everywhere). 幸運(yùn)的是,在Python中,處理目錄樹(shù)幾乎和檢查單個(gè)目錄一樣容易。我們既可以編寫(xiě)遞歸程序來(lái)遍歷樹(shù),也可以使用os模塊內(nèi)置的兩種樹(shù)遍歷工具。這些工具可對(duì)任意目錄樹(shù)進(jìn)行檢索、復(fù)制、比較,和其他處理,并且是在任何Python可以運(yùn)行的平臺(tái)上(那幾乎就是到處)。
To make it easy to apply an operation to all files in a tree hierarchy, Python comes with a utility that scans trees for us and runs a provided function at every directory along the way. The os.path.walk function is called with a directory root, function object, and optional data item, and walks the tree at the directory root and below. At each directory, the function object passed in is called with the optional data item, the name of the current directory, and a list of filenames in that directory (obtained from os.listdir). Typically, the function we provide (often referred to as a callback function) scans the filenames list to process files at each directory level in the tree. 為了方便對(duì)目錄樹(shù)層次結(jié)構(gòu)中的所有文件應(yīng)用一個(gè)操作,Python提供了一種實(shí)用工具,它會(huì)掃描目錄樹(shù),并沿途在每個(gè)目錄中運(yùn)行我們所提供的函數(shù)。該 os.path.walk函數(shù)被調(diào)用時(shí)需要指定目錄的根、一個(gè)函數(shù)對(duì)象和可選的數(shù)據(jù)項(xiàng),它將遍歷根目錄及以下的目錄樹(shù)。在每一個(gè)目錄,傳入的函數(shù)對(duì)象會(huì)被 調(diào)用,參數(shù)是可選的數(shù)據(jù)項(xiàng)、當(dāng)前目錄的名稱(chēng),以及該目錄的列表(從os.listdir獲得)。典型情況下,我們提供的函數(shù)(通常稱(chēng)為回調(diào)函數(shù))將掃描文 件列表,以處理樹(shù)上每個(gè)目錄級(jí)別下的文件。 That description might sound horribly complex the first time you hear it, but os.path.walk is fairly straightforward once you get the hang of it. In the following code, for example, the lister function is called from os.path.walk at each directory in the tree rooted at .. Along the way, lister simply prints the directory name and all the files at the current level (after prepending the directory name). It's simpler in Python than in English: 這樣的描述第一次聽(tīng)起來(lái)可能非常復(fù)雜,但只要你掌握它的決竅,os.path.walk其實(shí)相當(dāng)簡(jiǎn)單。例如,以下代碼中,在以.為根的目錄樹(shù) 中,os.path.walk會(huì)在每個(gè)目錄下調(diào)用lister函數(shù)。一路上,lister簡(jiǎn)單地打印當(dāng)前層次的目錄名和所有文件(在前面加上目錄名)。用 Python表達(dá)比用英語(yǔ)更簡(jiǎn)單: >>> import os >>> def lister(dummy, dirname, filesindir): ... print '[' + dirname + ']' ... for fname in filesindir: ... print os.path.join(dirname, fname) # handle one file ... >>> os.path.walk('.', lister, None) [.] .\about-pp.html .\python1.5.tar.gz .\about-pp2e.html .\about-ppr2e.html .\newdir [.\newdir] .\newdir\temp1 .\newdir\temp2 .\newdir\temp3 .\newdir\more [.\newdir\more] .\newdir\more\xxx.txt .\newdir\more\yyy.txt In other words, we've coded our own custom (and easily changed) recursive directory listing tool in Python. Because this may be something we would like to tweak and reuse elsewhere, let's make it permanently available in a module file, as shown in Example 4-4, now that we've worked out the details interactively. 換句話說(shuō),我們用Python編寫(xiě)了我們自己的自定義(并且容易更改的)遞歸目錄列表工具。因?yàn)槲覀兛赡軙?huì)在其他地方調(diào)整和重用這段代碼,既然我們已經(jīng)以交互方式完成了細(xì)節(jié),就讓我們把它寫(xiě)入模塊文件,讓它永久可用,如示例4-4所示。 Example 4-4. PP3E\System\Filetools\lister_walk.py # list file tree with os.path.walk import sys, os def lister(dummy, dirName, filesInDir): # called at each dir print '[' + dirName + ']' for fname in filesInDir: # includes subdir names path = os.path.join(dirName, fname) # add dir name prefix if not os.path.isdir(path): # print simple files only print path if _ _name_ _ == '_ _main_ _': os.path.walk(sys.argv[1], lister, None) # dir name in cmdline This is the same code except that directory names are filtered out of the filenames list by consulting the os.path.isdir test in order to avoid listing them twice (see, it's been tweaked already). When packaged this way, the code can also be run from a shell command line. Here it is being launched from a different directory, with the directory to be listed passed in as a command-line argument: 代碼幾乎相同,除了文件名用os.path.isdir進(jìn)行測(cè)試,以過(guò)濾掉列表中的目錄名,這是為了避免把它們列舉兩次(看,它已經(jīng)進(jìn)行了調(diào)整)。這樣包裝之后,代碼也可以從shell命令行運(yùn)行了。此處,它從不同的目錄啟動(dòng),而待列舉的目錄是通過(guò)命令行參數(shù)傳入的: C:\...\PP3E\System\Filetools>python lister_walk.py C:\Temp [C:\Temp] C:\Temp\about-pp.html C:\Temp\python1.5.tar.gz C:\Temp\about-pp2e.html C:\Temp\about-ppr2e.html [C:\Temp\newdir] C:\Temp\newdir\temp1 C:\Temp\newdir\temp2 C:\Temp\newdir\temp3 [C:\Temp\newdir\more] C:\Temp\newdir\more\xxx.txt C:\Temp\newdir\more\yyy.txt The walk paradigm also allows functions to tailor the set of directories visited by changing the file list argument in place. The library manual documents this further, but it's probably more instructive to simply know what walk truly looks like. Here is its actual Python-coded implementation for Windows platforms (at the time of this writing), with comments added to help demystify its operation: 該遍歷模式還允許函數(shù)就地更改文件列表參數(shù),來(lái)裁剪進(jìn)行訪問(wèn)的目錄集。庫(kù)手冊(cè)對(duì)此有更多的說(shuō)明,但了解walk的真正樣子可能更有益。下面是其Windows平臺(tái)實(shí)際的Python實(shí)現(xiàn)(在撰寫(xiě)本文時(shí)),附加了注釋以幫助解開(kāi)其神秘性: def walk(top, func, arg): # top is the current dirname try: names = os.listdir(top) # get all file/dir names here except os.error: # they have no path prefix return func(arg, top, names) # run func with names list here exceptions = ('.', '..') for name in names: # step over the very same list if name not in exceptions: # but skip self/parent names name = join(top, name) # add path prefix to name if isdir(name): walk(name, func, arg) # descend into subdirs here Notice that walk generates filename lists at each level with os.listdir, a call that collects both file and directory names in no particular order and returns them without their directory paths. Also note that walk uses the very same list returned by os.listdir and passed to the function you provide in order to later descend into subdirectories (variable names). Because lists are mutable objects that can be changed in place, if your function modifies the passed-in filenames list, it will impact what walk does next. For example, deleting directory names will prune traversal branches, and sorting the list will order the walk. 請(qǐng)注意,walk用os.listdir生成每一層的文件名列表,而os.listdir調(diào)用會(huì)同時(shí)收集文件名和目錄名,名字無(wú)任何特定的順序,并且返回 結(jié)果中不包含它們的目錄路徑。另外請(qǐng)注意,walk將os.listdir返回的列表傳入你所提供的函數(shù),然后又用該同一列表下降進(jìn)入各個(gè)子目錄(即變量 names)。由于列表是可變對(duì)象,可以就地更改,如果你的函數(shù)修改了傳入的文件名列表,就會(huì)影響walk的下一步動(dòng)作。例如,刪除目錄名會(huì)修剪遍歷的分 支,而排序該列表會(huì)調(diào)整walk的順序。
In recent Python releases, a new directory tree walker has been added which does not require a callback function to be coded. This new call, os.walk, is instead a generator function; when used within a for loop, each time through it yields a tuple containing the current directory name, a list of subdirectories in that directory, and a list of nondirectory files in that directory. 在最新的Python版本中,增加了一個(gè)新的目錄樹(shù)遍歷函數(shù),它不需要編寫(xiě)回調(diào)函數(shù)。這個(gè)全新的調(diào)用,os.walk,是一個(gè)生成器函數(shù),當(dāng)它在for循環(huán)內(nèi)使用時(shí),它每次會(huì)產(chǎn)生一個(gè)元組,其中包含當(dāng)前目錄名、該目錄的子目錄列表,及該目錄的非目錄文件列表。 Recall that generators have a .next( ) method implicitly invoked by for loops and other iteration contexts; each call forces the walker to the next directory in the tree. Essentially, os.walk replaces the os.path.walk callback function with a loop body, and so it may be easier to use (though you'll have to judge that for yourself). 回想一下,生成器有個(gè).next()方法,在for循環(huán)和其他迭代情況下,該方法會(huì)被隱式地調(diào)用;每次調(diào)用會(huì)迫使遍歷函數(shù)進(jìn)入樹(shù)上的下一個(gè)目錄。從本質(zhì)上 講,os.walk用循環(huán)替換了os.path.walk的回調(diào)函數(shù),所以它可能會(huì)更好用(但你必須自己判斷是否好用)。 For example, suppose you have a directory tree of files and you want to find all Python source files within it that reference the Tkinter GUI module. The traditional way to accomplish this with os.path.walk requires a callback function run at each level of the tree: 例如,假設(shè)你有個(gè)文件目錄樹(shù),你想搜索其中所有的Python源文件,查找對(duì)Tkinter GUI模塊的引用。用os.path.walk來(lái)完成的傳統(tǒng)方法需要一個(gè)回調(diào)函數(shù),os.path.walk會(huì)在樹(shù)的各個(gè)層次運(yùn)行該函數(shù): >>> import os >>> def atEachDir(matchlist, dirname, fileshere): for filename in fileshere: if filename.endswith('.py'): pathname = os.path.join(dirname, filename) if 'Tkinter' in open(pathname).read( ): matchlist.append(pathname) >>> matches = [] >>> os.path.walk(r'D:\PP3E', atEachDir, matches) >>> matches ['D:\\PP3E\\dev\\examples\\PP3E\\Preview\\peoplegui.py', 'D:\\PP3E\\dev\\ examples\\PP3E\\Preview\\tkinter101.py', 'D:\\PP3E\\dev\\examples\\PP3E\\ Preview\\tkinter001.py', 'D:\\PP3E\\dev\\examples\\PP3E\\Preview\\ peoplegui_class.py', 'D:\\PP3E\\dev\\examples\\PP3E\\Preview\\ tkinter102.py', 'D:\\PP3E\\NewExamples\\clock.py', 'D:\\PP3E\\NewExamples \\calculator.py'] This code loops through all the files at each level, looking for files with .py at the end of their names and which contain the search string. When a match is found, its full name is appended to the results list object, which is passed in as an argument (we could also just build a list of .py files and search each in a for loop after the walk). The equivalent os.walk code is similar, but the callback function's code becomes the body of a for loop, and directory names are filtered out for us: 這段代碼循環(huán)遍歷每一級(jí)的文件,尋找名字以.py結(jié)尾,并且包含搜索字符串的文件。當(dāng)找到一個(gè)匹配,其全稱(chēng)會(huì)附加到結(jié)果列表對(duì)象,該列表對(duì)象是作為參數(shù)傳 入的(我們也可以只建立一個(gè).py文件列表,然后在walk之后用for循環(huán)搜索)。等效的os.walk代碼與此相似,但回調(diào)函數(shù)的代碼變成了循環(huán)體, 并且目錄名已為我們過(guò)濾掉了: >>> import os >>> matches = [] >>> for (dirname, dirshere, fileshere) in os.walk(r'D:\PP3E'): for filename in fileshere: if filename.endswith('.py'): pathname = os.path.join(dirname, filename) if 'Tkinter' in open(pathname).read( ): matches.append(pathname) >>> matches ['D:\\PP3E\\dev\\examples\\PP3E\\Preview\\peoplegui.py', 'D:\\PP3E\\dev\\examples\\ PP3E\\Preview\\tkinter101.py', 'D:\\PP3E\\dev\\examples\\PP3E\\Preview\\ tkinter001.py', 'D:\\PP3E\\dev\\examples\\PP3E\\Preview\\peoplegui_class.py', 'D:\\ PP3E\\dev\\examples\\PP3E\\Preview\\tkinter102.py', 'D:\\PP3E\\NewExamples\\ clock.py', 'D:\\PP3E\\NewExamples\\calculator.py'] If you want to see what's really going on in the os.walk generator, call its next( ) method manually a few times as the for loop does automatically; each time, you advance to the next subdirectory in the tree: 如果你想看看os.walk生成器實(shí)際是如何運(yùn)作的,可以手動(dòng)調(diào)用幾次它的next()方法,來(lái)模擬for循環(huán)中的自動(dòng)調(diào)用;每一次,你會(huì)前進(jìn)到樹(shù)中的下一個(gè)子目錄: >>> gen = os.walk('D:\PP3E') >>> gen.next( ) ('D:\\PP3E', ['proposal', 'dev', 'NewExamples', 'bkp'], ['prg-python-2.zip']) >>> gen.next( ) ('D:\\PP3E\\proposal', [], ['proposal-programming-python-3e.doc']) >>> gen.next( ) ('D:\\PP3E\\dev', ['examples'], ['ch05.doc', 'ch06.doc', 'ch07.doc', 'ch08.doc', 'ch09.doc', 'ch10.doc', 'ch11.doc', 'ch12.doc', 'ch13.doc', 'ch14.doc', ...more... The os.walk generator has more features than I will demonstrate here. For instance, additional arguments allow you to specify a top-down or bottom-up traversal of the directory tree, and the list of subdirectories in the yielded tuple can be modified in-place to change the traversal in top-down mode, much as for os.path.walk. See the Python library manual for more details. os.walk生成器有許多功能我沒(méi)有在此展示。例如,附加參數(shù)允許你指定自上而下還是自下而上遍歷目錄樹(shù),以及在自上而下的模式中,生成的元組中的子目錄列表可以就地修改來(lái)更改遍歷,就像os.path.walk中的一樣。詳情請(qǐng)參閱Python庫(kù)手冊(cè)。 So why the new call? Is the new os.walk easier to use than the traditional os.path.walk? Perhaps, if you need to distinguish between subdirectories and files in each directory (os.walk gives us two lists rather than one) or can make use of a bottom-up traversal or other features. Otherwise, it's mostly just the trade of a function for a for loop header. You'll have to judge for yourself whether this is more natural or not; we'll use both forms in this book. 那么,為什么要有這個(gè)新的調(diào)用呢?是新的os.walk比傳統(tǒng)的os.path.walk更好用?如果您需要區(qū)分每個(gè)目錄中的子目錄和文件 (os.walk為我們提供了兩個(gè)列表,而不是一個(gè)),或者想利用自下而上的遍歷或其他功能,也許os.walk是更好用。否則,os.walk幾乎僅僅 是把一個(gè)函數(shù)替換為for循環(huán)頭。你必須自己去判斷這是否更自然;在本書(shū)中,這兩種形式我們都會(huì)使用。
The os.path.walk and os.walk tools do tree traversals for us, but it's sometimes more flexible and hardly any more work to do it ourselves. The following script recodes the directory listing script with a manual recursive traversal function (a function that calls itself to repeat its actions). The mylister function in Example 4-5 is almost the same as lister in Example 4-4 but calls os.listdir to generate file paths manually and calls itself recursively to descend into subdirectories. os.path.walk和os.walk工具可以為我們做樹(shù)遍歷,但有時(shí),我們自己遍歷會(huì)更靈活,并且?guī)缀鯚o(wú)須做太多工作。以下腳本用一個(gè)手動(dòng)遞歸遍歷 函數(shù)重寫(xiě)了目錄列表腳本(遞歸函數(shù)就是它會(huì)調(diào)用自身做重復(fù)的動(dòng)作)。示例4-5中的mylister函數(shù)與示例4-4的lister幾乎相同,但它調(diào)用 os.listdir來(lái)手動(dòng)產(chǎn)生文件路徑,并遞歸調(diào)用自己進(jìn)入子目錄。 Example 4-5. PP3E\System\Filetools\lister_recur.py # list files in dir tree by recursion import sys, os def mylister(currdir): print '[' + currdir + ']' for file in os.listdir(currdir): # list files here path = os.path.join(currdir, file) # add dir path back if not os.path.isdir(path): print path else: mylister(path) # recur into subdirs if _ _name_ _ == '_ _main_ _': mylister(sys.argv[1]) # dir name in cmdline This version is packaged as a script too (this is definitely too much code to type at the interactive prompt); its output is identical when run as a script: 此版本也被打包為腳本(在交互式提示符下敲代碼,這無(wú)疑是太多了);作為腳本運(yùn)行時(shí),其輸出是相同的: C:\...\PP3E\System\Filetools>python lister_recur.py C:\Temp [C:\Temp] C:\Temp\about-pp.html C:\Temp\python1.5.tar.gz C:\Temp\about-pp2e.html C:\Temp\about-ppr2e.html [C:\Temp\newdir] C:\Temp\newdir\temp1 C:\Temp\newdir\temp2 C:\Temp\newdir\temp3 [C:\Temp\newdir\more] C:\Temp\newdir\more\xxx.txt C:\Temp\newdir\more\yyy.txt But this file is just as useful when imported and called elsewhere: 但是該文件可以在其他地方被導(dǎo)入并調(diào)用: C:\temp>python >>> from PP3E.System.Filetools.lister_recur import mylister >>> mylister('.') [.] .\about-pp.html .\python1.5.tar.gz .\about-pp2e.html .\about-ppr2e.html [.\newdir] .\newdir\temp1 .\newdir\temp2 .\newdir\temp3 [.\newdir\more] .\newdir\more\xxx.txt .\newdir\more\yyy.txt We will make better use of most of this section's techniques in later examples in Chapter 7 and in this book at large. For example, scripts for copying and comparing directory trees use the tree-walker techniques listed previously. Watch for these tools in action along the way. If you are interested in directory processing, also see the discussion of Python's old grep module in Chapter 7; it searches files and can be applied to all files in a directory when combined with the glob module, but it simply prints results and does not traverse directory trees by itself. 在本書(shū)及后面第7章的例子中,我們將好好地利用本節(jié)的大部分技術(shù)。例如,復(fù)制和比較目錄樹(shù)的腳本會(huì)使用前面列出的樹(shù)遍歷技術(shù)。請(qǐng)一路上注意這些實(shí)用工具。 如果你對(duì)目錄處理有興趣,也請(qǐng)看看第7章對(duì)Python舊的grep模塊的討論;grep會(huì)搜索文件,并且與glob模塊組合時(shí),可以應(yīng)用于目錄中的所有 文件,但它本身只是打印結(jié)果,并不遍歷目錄樹(shù)。
Powered by: C++博客 Copyright © 金慶