Programming Python, 3rd Edition 翻譯
最新版本見wiki:http://wiki.woodpecker.org.cn/moin/PP3eD
歡迎參與翻譯與修訂。
4.3. Directory Tools
4.3. 目錄工具
One of the more common tasks in the shell utilities domain is applying
an operation to a set of files in a directorya "folder" in
Windows-speak. By running a script on a batch of files, we can automate
(that is, script) tasks we might have to otherwise run repeatedly by
hand.
在shell應(yīng)用領(lǐng)域,更常見的任務(wù)是,操作目錄中的一組文件,按Windows的說法是“文件夾”。通過對一批文件運行腳本,我們可以將任務(wù)自動化(即腳本化),否則我們就必須以手工方式重復(fù)運行腳本。
For instance, suppose you need to search all of your Python files in a
development directory for a global variable name (perhaps you've
forgotten where it is used). There are many platform-specific ways to
do this (e.g., the grep command in Unix), but Python scripts that
accomplish such tasks will work on every platform where Python
worksWindows, Unix, Linux, Macintosh, and just about any other platform
commonly used today. If you simply copy your script to any machine you
wish to use it on, it will work regardless of which other tools are
available there.
例如,假設(shè)你需要搜索開發(fā)目錄中所有的Python文件,以查找一個全局變量名(也許你忘了在哪兒使用過它)。有許多平臺專用的方法可以做到這一點(例如
Unix
grep命令),但完成這種任務(wù)的Python腳本可以運行于所有Python可以運行的平臺:Windows、Unix、Macintosh和幾乎所有
目前常用的其他平臺。你只需將你的腳本復(fù)制到你想使用的機(jī)器,不管該機(jī)器上其他工具是否可用,腳本都可以運行。
4.3.1. Walking One Directory
4.3.1. 遍歷一個目錄
The most common way to go about writing such tools is to first grab a
list of the names of the files you wish to process, and then step
through that list with a Python for loop, processing each file in turn.
The trick we need to learn here, then, is how to get such a directory
list within our scripts. There are at least three options: running
shell listing commands with os.popen, matching filename patterns with
glob.glob, and getting directory listings with os.listdir. They vary in
interface, result format, and portability.
編寫這類工具最常用的方法是,先獲取你要處理的文件名列表,然后通過Python
for循環(huán)遍歷該列表,依次處理每個文件。那么,這里我們需要學(xué)習(xí)的訣竅是,如何在腳本中得到這樣一個目錄列表。至少有三種方法:用os.popen運行
shell目錄列表命令、用glob.glob進(jìn)行文件名模式匹配,或用os.listdir得到目錄列表。這三種方法在接口、結(jié)果格式和可移植性上各不
相同。
4.3.1.1. Running shell listing commands with os.popen
4.3.1.1. 用os.popen運行shell目錄列表命令
Quick: how did you go about getting directory file listings before you
heard of Python? If you're new to shell tools programming, the answer
may be "Well, I started a Windows file explorer and clicked on stuff,"
but I'm thinking here in terms of less GUI-oriented command-line
mechanisms (and answers submitted in Perl and Tcl get only partial
credit).
搶答:在你聽說Python之前,你是如何獲取目錄中的文件列表的呢?如果您不熟悉shell工具編程,答案可能是“嗯,我打開了Windows資源管理器并點擊目錄”,但我在這里要求使用非GUI的命令行機(jī)制(并且用Perl和Tcl回答都不能得到滿分)。
On Unix, directory listings are usually obtained by typing ls in a
shell; on Windows, they can be generated with a dir command typed in an
MS-DOS console box. Because Python scripts may use os.popen to run any
command line that we can type in a shell, they are the most general way
to grab a directory listing inside a Python program. We met os.popen in
the prior chapter; it runs a shell command string and gives us a file
object from which we can read the command's output. To illustrate,
let's first assume the following directory structures (yes, I have both
dir and ls commands on my Windows laptop; old habits die hard):
在Unix上,通常在shell中鍵入ls來獲得目錄列表;在Windows上,可以在MS-DOS控制臺窗口中鍵入dir命令來生成目錄列表。由于
Python腳本可以使用os.popen運行任何命令行,就像在shell中輸入一樣,這是在Python程序中獲取目錄列表的最一般的方法。我們在上
一章見過os.popen,它會運行一個shell命令字符串,并且提供一個文件對象,我們可以從該文件讀取命令的輸出。作為例子,我們先假設(shè)有以下目錄
結(jié)構(gòu)(是的,我的Windows筆記本上同時有dir和ls命令,舊習(xí)難改):
C:\temp>dir /B
about-pp.html
python1.5.tar.gz
about-pp2e.html
about-ppr2e.html
newdir
C:\temp>ls
about-pp.html about-ppr2e.html python1.5.tar.gz
about-pp2e.html newdir
C:\temp>ls newdir
more temp1 temp2 temp3
The newdir name is a nested subdirectory in C:\temp here. Now, scripts
can grab a listing of file and directory names at this level by simply
spawning the appropriate platform-specific command line and reading its
output (the text normally thrown up on the console window):
其中newdir是C:\temp的子目錄。現(xiàn)在,腳本可以在該層上抓取文件和目錄名列表了,只需運行適當(dāng)?shù)脑撈脚_上的命令行,并讀取其輸出(正常情況下,文字會產(chǎn)生在控制臺窗口上):
C:\temp>python
>>> import os
>>> os.popen('dir /B').readlines( )
['about-pp.html\n', 'python1.5.tar.gz\n', 'about-pp2e.html\n',
'about-ppr2e.html\n', 'newdir\n']
Lines read from a shell command come back with a trailing end-of-line
character, but it's easy enough to slice off with a for loop or list
comprehension expression as in the following code:
從shell命令讀取的行帶有行尾符,但很容易通過for循環(huán)或者列表解析表達(dá)式用分片操作切除,如以下代碼:
>>> for line in os.popen('dir /B').readlines( ):
... print line[:-1]
...
about-pp.html
python1.5.tar.gz
about-pp2e.html
about-ppr2e.html
newdir
>>> lines = [line[:-1] for line in os.popen('dir /B')]
>>> lines
['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html',
'about-ppr2e.html', 'newdir']
One subtle thing: notice that the object returned by os.popen has an
iterator that reads one line per request (i.e., per next( ) method
call), just like normal files, so calling the readlines method is
optional here unless you really need to extract the result list all at
once (see the discussion of file iterators earlier in this chapter).
For pipe objects, the effect of iterators is even more useful than
simply avoiding loading the entire result into memory all at once:
readlines will block the caller until the spawned program is completely
finished, whereas the iterator might not.
注意一個微妙之處:os.popen返回的對象有個迭代器,每次請求時它就會讀取一行(即每次next()方法調(diào)用時),就像普通文件一樣,所以調(diào)用
readlines方法是可選的,除非你真的需要一下子提取結(jié)果列表(見本章前面文件迭代器的討論)。對于管道對象,迭代器的效果更為有用,不僅僅是避免
一下子加載整個結(jié)果到內(nèi)存:readlines會阻塞調(diào)用者,直到生成的程序完全結(jié)束,而迭代器不會。
The dir and ls commands let us be specific about filename patterns to
be matched and directory names to be listed; again, we're just running
shell commands here, so anything you can type at a shell prompt goes:
dir和ls命令可以讓我們指定文件名匹配的模式和需要列出的目錄名;再說一次,在這里我們只是運行shell命令,所以,任何只要你可以在shell提示符下鍵入的命令都可以:
>>> os.popen('dir *.html /B').readlines( )
['about-pp.html\n', 'about-pp2e.html\n', 'about-ppr2e.html\n']
>>> os.popen('ls *.html').readlines( )
['about-pp.html\n', 'about-pp2e.html\n', 'about-ppr2e.html\n']
>>> os.popen('dir newdir /B').readlines( )
['temp1\n', 'temp2\n', 'temp3\n', 'more\n']
>>> os.popen('ls newdir').readlines( )
['more\n', 'temp1\n', 'temp2\n', 'temp3\n']
These calls use general tools and work as advertised. As I noted
earlier, though, the downsides of os.popen are that it requires using a
platform-specific shell command and it incurs a performance hit to
start up an independent program. The following two alternative
techniques do better on both counts.
這些調(diào)用使用了一般的工具,并且能正確工作。但是,正如我前面指出,os.popen的缺點是它需要使用特定于平臺的shell命令,并且,它需要啟動一個獨立程序而導(dǎo)致性能損耗。下面的兩個替代技術(shù)在這兩點上做得更好。
4.3.1.2. The glob module
4.3.1.2. glob模塊
The term globbing comes from the * wildcard character in filename
patterns; per computing folklore, a * matches a "glob" of characters.
In less poetic terms, globbing simply means collecting the names of all
entries in a directoryfiles and subdirectorieswhose names match a given
filename pattern. In Unix shells, globbing expands filename patterns
within a command line into all matching filenames before the command is
ever run. In Python, we can do something similar by calling the
glob.glob built-in with a pattern to expand:
glob一詞來自文件名模式中的通配符*;在計算機(jī)民間傳統(tǒng)中,一個*匹配“glob(所有)”字符。用缺乏詩意的話說,glob僅僅意味著收集目錄中所
有符合給定文件名模式的文件名和子目錄名。在Unix
shell中,命令運行前,glob會將命令行中的文件名模式擴(kuò)展為所有匹配的文件名。在Python中,我們可以通過調(diào)用glob.glob做類似的事
情,參數(shù)為待擴(kuò)展的模式:
>>> import glob
>>> glob.glob('*')
['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html',
'newdir']
>>> glob.glob('*.html')
['about-pp.html', 'about-pp2e.html', 'about-ppr2e.html']
>>> glob.glob('newdir/*')
['newdir\\temp1', 'newdir\\temp2', 'newdir\\temp3', 'newdir\\more']
The glob call accepts the usual filename pattern syntax used in shells
(e.g., ? means any one character, * means any number of characters, and
[] is a character selection set).[*] The pattern should include a
directory path if you wish to glob in something other than the current
working directory, and the module accepts either Unix or DOS-style
directory separators (/ or \). Also, this call is implemented without
spawning a shell command and so is likely to be faster and more
portable across all Python platforms than the os.popen schemes shown
earlier.
glob調(diào)用接受在shell中使用的通常的文件名模式語法(例如,?表示任何一個字符,*表示任意多個字符,以及[]是字符選擇集)[*]。如果你希望
glob的東西不在當(dāng)前工作目錄,模式中還應(yīng)該包括目錄路徑,該模塊可以接受Unix或DOS樣式的目錄分隔符(/或\)。另外,該調(diào)用的實現(xiàn)中沒有產(chǎn)生
shell命令,因此比前面所示的os.popen方案更快,并且移植性更好,可用于所有的Python平臺。
[*] In fact, glob just uses the standard fnmatch module to match name
patterns; see the fnmatch description later in this chapter for more
details.
[*] 事實上,glob只是利用標(biāo)準(zhǔn)的fnmatch模塊匹配名稱模式,詳見本章后面對fnmatch的描述。
Technically speaking, glob is a bit more powerful than described so
far. In fact, using it to list files in one directory is just one use
of its pattern-matching skills. For instance, it can also be used to
collect matching names across multiple directories, simply because each
level in a passed-in directory path can be a pattern too:
從技術(shù)上講,glob比迄今所描述的還強(qiáng)大一點。其實,用它來列出一個目錄中的文件只是其模式匹配技術(shù)的應(yīng)用之一。例如,它也可以用于跨多個目錄收集匹配的名字,因為傳入的目錄路徑的每一級都可以是一個模式:
C:\temp>python
>>> import glob
>>> for name in glob.glob('*examples/L*.py'): print name
...
cpexamples\Launcher.py
cpexamples\Launch_PyGadgets.py
cpexamples\LaunchBrowser.py
cpexamples\launchmodes.py
examples\Launcher.py
examples\Launch_PyGadgets.py
examples\LaunchBrowser.py
examples\launchmodes.py
>>> for name in glob.glob(r'*\*\visitor_find*.py'): print name
...
cpexamples\PyTools\visitor_find.py
cpexamples\PyTools\visitor_find_quiet2.py
cpexamples\PyTools\visitor_find_quiet1.py
examples\PyTools\visitor_find.py
examples\PyTools\visitor_find_quiet2.py
examples\PyTools\visitor_find_quiet1.py
In the first call here, we get back filenames from two different
directories that match the *examples pattern; in the second, both of
the first directory levels are wildcards, so Python collects all
possible ways to reach the base filenames. Using os.popen to spawn
shell commands achieves the same effect only if the underlying shell or
listing command does too.
此處第一個調(diào)用中,我們從兩個不同的目錄得到了文件名,這兩個目錄都匹配模式*examples;在第二個中,前兩個目錄級別都是通配符,所以
Python查找一切可能的路徑來收集基本文件名。如果用os.popen產(chǎn)生shell命令要達(dá)到同樣的效果,只有在底層shell或列表命令能夠做到
時才行。
4.3.1.3. The os.listdir call
4.3.1.3. os.listdir調(diào)用
The os module's listdir call provides yet another way to collect
filenames in a Python list. It takes a simple directory name string,
not a filename pattern, and returns a list containing the names of all
entries in that directoryboth simple files and nested directoriesfor
use in the calling script:
os模塊的listdir調(diào)用提供了另一方法,它會將名字收集成Python列表。它需要一個普通的目錄名字符串,而不是一個文件名模式,并且,它返回一個列表供腳本使用,其中包含該目錄中所有條目的名字,不管是簡單的文件,還是嵌套目錄:
>>> os.listdir('.')
['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html',
'newdir']
>>> os.listdir(os.curdir)
['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html',
'newdir']
>>> os.listdir('newdir')
['temp1', 'temp2', 'temp3', 'more']
This too is done without resorting to shell commands and so is portable
to all major Python platforms. The result is not in any particular
order (but can be sorted with the list sort method), returns base
filenames without their directory path prefixes, and includes names of
both files and directories at the listed level.
它也沒有借助shell命令,因此可以移植到所有主要的Python平臺。它的結(jié)果沒有任何特定的順序(但可以用列表的排序方法進(jìn)行排序),返回的是不帶目錄路徑前綴的基本文件名,并且同時包含所列舉目錄中的文件名和目錄名。
To compare all three listing techniques, let's run them here side by
side on an explicit directory. They differ in some ways but are mostly
just variations on a themeos.popen sorts names and returns
end-of-lines, glob.glob accepts a pattern and returns filenames with
directory prefixes, and os.listdir takes a simple directory name and
returns names without directory prefixes:
為了比較這三種目錄列表技術(shù),讓我們在特定目錄下依次運行它們。它們在某些方面有所不同,但大多只是主題不同。os.popen會排序名字,并返回行尾
符,glob.glob接受一個模式并返回帶目錄前綴的文件名,而os.listdir需要一個普通的目錄名,并返回不帶目錄前綴的名字:
>>> os.popen('ls C:\PP3rdEd').readlines( )
['README.txt\n', 'cdrom\n', 'chapters\n', 'etc\n', 'examples\n',
'examples.tar.gz\n', 'figures\n', 'shots\n']
>>> glob.glob('C:\PP3rdEd\*')
['C:\\PP3rdEd\\examples.tar.gz', 'C:\\PP3rdEd\\README.txt',
'C:\\PP3rdEd\\shots', 'C:\\PP3rdEd\\figures', 'C:\\PP3rdEd\\examples',
'C:\\PP3rdEd\\etc', 'C:\\PP3rdEd\\chapters', 'C:\\PP3rdEd\\cdrom']
>>> os.listdir('C:\PP3rdEd')
['examples.tar.gz', 'README.txt', 'shots', 'figures', 'examples', 'etc',
'chapters', 'cdrom']
Of these three, glob and listdir are generally better options if you
care about script portability, and listdir seems fastest in recent
Python releases (but gauge its performance yourselfimplementations may
change over time).
三者之中,如果您關(guān)心腳本的可移植性,glob和listdir一般是更好的選擇,在最新的Python版本中,listdir似乎是最快的(但您需要自己衡量其表現(xiàn),實現(xiàn)可能會隨時間變化)。
4.3.1.4. Splitting and joining listing results
4.3.1.4. 分割與合并列表結(jié)果
In the last example, I pointed out that glob returns names with
directory paths, whereas listdir gives raw base filenames. For
convenient processing, scripts often need to split glob results into
base files or expand listdir results into full paths. Such translations
are easy if we let the os.path module do all the work for us. For
example, a script that intends to copy all files elsewhere will
typically need to first split off the base filenames from glob results
so that it can add different directory names on the front:
我在上例中指出,glob返回帶目錄路徑的名字,而listdir給出的是原始的基本文件名。為方便處理,腳本通常需要將glob的結(jié)果分割成基本文件
名,或?qū)istdir的結(jié)果擴(kuò)展到完整路徑。讓os.path模塊做這種轉(zhuǎn)換很容易。例如,如果腳本打算將所有文件復(fù)制到其他地方,一般需要先從
glob的結(jié)果中分割出基本文件名,這樣它才可以在前面添加不同的目錄名:
>>> dirname = r'C:\PP3rdEd'
>>> for file in glob.glob(dirname + '/*'):
... head, tail = os.path.split(file)
... print head, tail, '=>', ('C:\\Other\\' + tail)
...
C:\PP3rdEd examples.tar.gz => C:\Other\examples.tar.gz
C:\PP3rdEd README.txt => C:\Other\README.txt
C:\PP3rdEd shots => C:\Other\shots
C:\PP3rdEd figures => C:\Other\figures
C:\PP3rdEd examples => C:\Other\examples
C:\PP3rdEd etc => C:\Other\etc
C:\PP3rdEd chapters => C:\Other\chapters
C:\PP3rdEd cdrom => C:\Other\cdrom
Here, the names after the => represent names that files might be
moved to. Conversely, a script that means to process all files in a
different directory than the one it runs in will probably need to
prepend listdir results with the target directory name before passing
filenames on to other tools:
其中,=>后面的名字代表文件移動的目的文件名。相反,如果腳本要處理其他目錄中的所有文件,而非當(dāng)前它所運行的目錄,它可能需要在listdir的結(jié)果前添加目標(biāo)目錄名,然后才能將文件名傳給其他工具:
>>> for file in os.listdir(dirname):
... print os.path.join(dirname, file)
...
C:\PP3rdEd\examples.tar.gz
C:\PP3rdEd\README.txt
C:\PP3rdEd\shots
C:\PP3rdEd\figures
C:\PP3rdEd\examples
C:\PP3rdEd\etc
C:\PP3rdEd\chapters
C:\PP3rdEd\cdrom
4.3.2. Walking Directory Trees
4.3.2. 遍歷目錄樹
As you read the prior section, you may have noticed that all of the
preceding techniques return the names of files in only a single
directory. What if you want to apply an operation to every file in
every directory and subdirectory in an entire directory tree?
當(dāng)你閱讀前一部分時,你可能已經(jīng)注意到,前面的方法返回的文件名都是僅在一個目錄下的文件。如果你想要在整個目錄樹中,對每個目錄和子目錄中的所有文件操作,那該怎么辦?
For instance, suppose again that we need to find every occurrence of a
global name in our Python scripts. This time, though, our scripts are
arranged into a module package: a directory with nested subdirectories,
which may have subdirectories of their own. We could rerun our
hypothetical single-directory searcher manually in every directory in
the tree, but that's tedious, error prone, and just plain not fun.
例如,再次假設(shè)我們需要在多個Python腳本中查找一個全局變量名的所有使用。不過這一次,我們的腳本被編排成了模塊封裝包:一個包含嵌套子目錄的目
錄,子目錄可能有它們自己的子目錄。我們可以在目錄樹中的每個目錄下,手工重復(fù)運行我們假想的單目錄搜索器,但這很乏味,容易出錯,一點也不好玩。
Luckily, in Python it's almost as easy to process a directory tree as
it is to inspect a single directory. We can either write a recursive
routine to traverse the tree, or use one of two tree-walker utilities
built into the os module. Such tools can be used to search, copy,
compare, and otherwise process arbitrary directory trees on any
platform that Python runs on (and that's just about everywhere).
幸運的是,在Python中,處理目錄樹幾乎和檢查單個目錄一樣容易。我們既可以編寫遞歸程序來遍歷樹,也可以使用os模塊內(nèi)置的兩種樹遍歷工具。這些工具可對任意目錄樹進(jìn)行檢索、復(fù)制、比較,和其他處理,并且是在任何Python可以運行的平臺上(那幾乎就是到處)。
4.3.2.1. The os.path.walk visitor
4.3.2.1. os.path.walk訪問者
To make it easy to apply an operation to all files in a tree hierarchy,
Python comes with a utility that scans trees for us and runs a provided
function at every directory along the way. The os.path.walk function is
called with a directory root, function object, and optional data item,
and walks the tree at the directory root and below. At each directory,
the function object passed in is called with the optional data item,
the name of the current directory, and a list of filenames in that
directory (obtained from os.listdir). Typically, the function we
provide (often referred to as a callback function) scans the filenames
list to process files at each directory level in the tree.
為了方便對目錄樹層次結(jié)構(gòu)中的所有文件應(yīng)用一個操作,Python提供了一種實用工具,它會掃描目錄樹,并沿途在每個目錄中運行我們所提供的函數(shù)。該
os.path.walk函數(shù)被調(diào)用時需要指定目錄的根、一個函數(shù)對象和可選的數(shù)據(jù)項,它將遍歷根目錄及以下的目錄樹。在每一個目錄,傳入的函數(shù)對象會被
調(diào)用,參數(shù)是可選的數(shù)據(jù)項、當(dāng)前目錄的名稱,以及該目錄的列表(從os.listdir獲得)。典型情況下,我們提供的函數(shù)(通常稱為回調(diào)函數(shù))將掃描文
件列表,以處理樹上每個目錄級別下的文件。
That description might sound horribly complex the first time you hear
it, but os.path.walk is fairly straightforward once you get the hang of
it. In the following code, for example, the lister function is called
from os.path.walk at each directory in the tree rooted at .. Along the
way, lister simply prints the directory name and all the files at the
current level (after prepending the directory name). It's simpler in
Python than in English:
這樣的描述第一次聽起來可能非常復(fù)雜,但只要你掌握它的決竅,os.path.walk其實相當(dāng)簡單。例如,以下代碼中,在以.為根的目錄樹
中,os.path.walk會在每個目錄下調(diào)用lister函數(shù)。一路上,lister簡單地打印當(dāng)前層次的目錄名和所有文件(在前面加上目錄名)。用
Python表達(dá)比用英語更簡單:
>>> import os
>>> def lister(dummy, dirname, filesindir):
... print '[' + dirname + ']'
... for fname in filesindir:
... print os.path.join(dirname, fname) # handle one file
...
>>> os.path.walk('.', lister, None)
[.]
.\about-pp.html
.\python1.5.tar.gz
.\about-pp2e.html
.\about-ppr2e.html
.\newdir
[.\newdir]
.\newdir\temp1
.\newdir\temp2
.\newdir\temp3
.\newdir\more
[.\newdir\more]
.\newdir\more\xxx.txt
.\newdir\more\yyy.txt
In other words, we've coded our own custom (and easily changed)
recursive directory listing tool in Python. Because this may be
something we would like to tweak and reuse elsewhere, let's make it
permanently available in a module file, as shown in Example 4-4, now
that we've worked out the details interactively.
換句話說,我們用Python編寫了我們自己的自定義(并且容易更改的)遞歸目錄列表工具。因為我們可能會在其他地方調(diào)整和重用這段代碼,既然我們已經(jīng)以交互方式完成了細(xì)節(jié),就讓我們把它寫入模塊文件,讓它永久可用,如示例4-4所示。
Example 4-4. PP3E\System\Filetools\lister_walk.py
# list file tree with os.path.walk
import sys, os
def lister(dummy, dirName, filesInDir): # called at each dir
print '[' + dirName + ']'
for fname in filesInDir: # includes subdir names
path = os.path.join(dirName, fname) # add dir name prefix
if not os.path.isdir(path): # print simple files only
print path
if _ _name_ _ == '_ _main_ _':
os.path.walk(sys.argv[1], lister, None) # dir name in cmdline
This is the same code except that directory names are filtered out of
the filenames list by consulting the os.path.isdir test in order to
avoid listing them twice (see, it's been tweaked already). When
packaged this way, the code can also be run from a shell command line.
Here it is being launched from a different directory, with the
directory to be listed passed in as a command-line argument:
代碼幾乎相同,除了文件名用os.path.isdir進(jìn)行測試,以過濾掉列表中的目錄名,這是為了避免把它們列舉兩次(看,它已經(jīng)進(jìn)行了調(diào)整)。這樣包裝之后,代碼也可以從shell命令行運行了。此處,它從不同的目錄啟動,而待列舉的目錄是通過命令行參數(shù)傳入的:
C:\...\PP3E\System\Filetools>python lister_walk.py C:\Temp
[C:\Temp]
C:\Temp\about-pp.html
C:\Temp\python1.5.tar.gz
C:\Temp\about-pp2e.html
C:\Temp\about-ppr2e.html
[C:\Temp\newdir]
C:\Temp\newdir\temp1
C:\Temp\newdir\temp2
C:\Temp\newdir\temp3
[C:\Temp\newdir\more]
C:\Temp\newdir\more\xxx.txt
C:\Temp\newdir\more\yyy.txt
The walk paradigm also allows functions to tailor the set of
directories visited by changing the file list argument in place. The
library manual documents this further, but it's probably more
instructive to simply know what walk truly looks like. Here is its
actual Python-coded implementation for Windows platforms (at the time
of this writing), with comments added to help demystify its operation:
該遍歷模式還允許函數(shù)就地更改文件列表參數(shù),來裁剪進(jìn)行訪問的目錄集。庫手冊對此有更多的說明,但了解walk的真正樣子可能更有益。下面是其Windows平臺實際的Python實現(xiàn)(在撰寫本文時),附加了注釋以幫助解開其神秘性:
def walk(top, func, arg): # top is the current dirname
try:
names = os.listdir(top) # get all file/dir names here
except os.error: # they have no path prefix
return
func(arg, top, names) # run func with names list here
exceptions = ('.', '..')
for name in names: # step over the very same list
if name not in exceptions: # but skip self/parent names
name = join(top, name) # add path prefix to name
if isdir(name):
walk(name, func, arg) # descend into subdirs here
Notice that walk generates filename lists at each level with
os.listdir, a call that collects both file and directory names in no
particular order and returns them without their directory paths. Also
note that walk uses the very same list returned by os.listdir and
passed to the function you provide in order to later descend into
subdirectories (variable names). Because lists are mutable objects that
can be changed in place, if your function modifies the passed-in
filenames list, it will impact what walk does next. For example,
deleting directory names will prune traversal branches, and sorting the
list will order the walk.
請注意,walk用os.listdir生成每一層的文件名列表,而os.listdir調(diào)用會同時收集文件名和目錄名,名字無任何特定的順序,并且返回
結(jié)果中不包含它們的目錄路徑。另外請注意,walk將os.listdir返回的列表傳入你所提供的函數(shù),然后又用該同一列表下降進(jìn)入各個子目錄(即變量
names)。由于列表是可變對象,可以就地更改,如果你的函數(shù)修改了傳入的文件名列表,就會影響walk的下一步動作。例如,刪除目錄名會修剪遍歷的分
支,而排序該列表會調(diào)整walk的順序。
4.3.2.2. The os.walk generator
4.3.2.2. os.walk生成器
In recent Python releases, a new directory tree walker has been added
which does not require a callback function to be coded. This new call,
os.walk, is instead a generator function; when used within a for loop,
each time through it yields a tuple containing the current directory
name, a list of subdirectories in that directory, and a list of
nondirectory files in that directory.
在最新的Python版本中,增加了一個新的目錄樹遍歷函數(shù),它不需要編寫回調(diào)函數(shù)。這個全新的調(diào)用,os.walk,是一個生成器函數(shù),當(dāng)它在for循環(huán)內(nèi)使用時,它每次會產(chǎn)生一個元組,其中包含當(dāng)前目錄名、該目錄的子目錄列表,及該目錄的非目錄文件列表。
Recall that generators have a .next( ) method implicitly invoked by for
loops and other iteration contexts; each call forces the walker to the
next directory in the tree. Essentially, os.walk replaces the
os.path.walk callback function with a loop body, and so it may be
easier to use (though you'll have to judge that for yourself).
回想一下,生成器有個.next()方法,在for循環(huán)和其他迭代情況下,該方法會被隱式地調(diào)用;每次調(diào)用會迫使遍歷函數(shù)進(jìn)入樹上的下一個目錄。從本質(zhì)上
講,os.walk用循環(huán)替換了os.path.walk的回調(diào)函數(shù),所以它可能會更好用(但你必須自己判斷是否好用)。
For example, suppose you have a directory tree of files and you want to
find all Python source files within it that reference the Tkinter GUI
module. The traditional way to accomplish this with os.path.walk
requires a callback function run at each level of the tree:
例如,假設(shè)你有個文件目錄樹,你想搜索其中所有的Python源文件,查找對Tkinter GUI模塊的引用。用os.path.walk來完成的傳統(tǒng)方法需要一個回調(diào)函數(shù),os.path.walk會在樹的各個層次運行該函數(shù):
>>> import os
>>> def atEachDir(matchlist, dirname, fileshere):
for filename in fileshere:
if filename.endswith('.py'):
pathname = os.path.join(dirname, filename)
if 'Tkinter' in open(pathname).read( ):
matchlist.append(pathname)
>>> matches = []
>>> os.path.walk(r'D:\PP3E', atEachDir, matches)
>>> matches
['D:\\PP3E\\dev\\examples\\PP3E\\Preview\\peoplegui.py', 'D:\\PP3E\\dev\\
examples\\PP3E\\Preview\\tkinter101.py', 'D:\\PP3E\\dev\\examples\\PP3E\\
Preview\\tkinter001.py', 'D:\\PP3E\\dev\\examples\\PP3E\\Preview\\
peoplegui_class.py', 'D:\\PP3E\\dev\\examples\\PP3E\\Preview\\
tkinter102.py', 'D:\\PP3E\\NewExamples\\clock.py', 'D:\\PP3E\\NewExamples
\\calculator.py']
This code loops through all the files at each level, looking for files
with .py at the end of their names and which contain the search string.
When a match is found, its full name is appended to the results list
object, which is passed in as an argument (we could also just build a
list of .py files and search each in a for loop after the walk). The
equivalent os.walk code is similar, but the callback function's code
becomes the body of a for loop, and directory names are filtered out
for us:
這段代碼循環(huán)遍歷每一級的文件,尋找名字以.py結(jié)尾,并且包含搜索字符串的文件。當(dāng)找到一個匹配,其全稱會附加到結(jié)果列表對象,該列表對象是作為參數(shù)傳
入的(我們也可以只建立一個.py文件列表,然后在walk之后用for循環(huán)搜索)。等效的os.walk代碼與此相似,但回調(diào)函數(shù)的代碼變成了循環(huán)體,
并且目錄名已為我們過濾掉了:
>>> import os
>>> matches = []
>>> for (dirname, dirshere, fileshere) in os.walk(r'D:\PP3E'):
for filename in fileshere:
if filename.endswith('.py'):
pathname = os.path.join(dirname, filename)
if 'Tkinter' in open(pathname).read( ):
matches.append(pathname)
>>> matches
['D:\\PP3E\\dev\\examples\\PP3E\\Preview\\peoplegui.py', 'D:\\PP3E\\dev\\examples\\
PP3E\\Preview\\tkinter101.py', 'D:\\PP3E\\dev\\examples\\PP3E\\Preview\\
tkinter001.py', 'D:\\PP3E\\dev\\examples\\PP3E\\Preview\\peoplegui_class.py', 'D:\\
PP3E\\dev\\examples\\PP3E\\Preview\\tkinter102.py', 'D:\\PP3E\\NewExamples\\
clock.py', 'D:\\PP3E\\NewExamples\\calculator.py']
If you want to see what's really going on in the os.walk generator,
call its next( ) method manually a few times as the for loop does
automatically; each time, you advance to the next subdirectory in the
tree:
如果你想看看os.walk生成器實際是如何運作的,可以手動調(diào)用幾次它的next()方法,來模擬for循環(huán)中的自動調(diào)用;每一次,你會前進(jìn)到樹中的下一個子目錄:
>>> gen = os.walk('D:\PP3E')
>>> gen.next( )
('D:\\PP3E', ['proposal', 'dev', 'NewExamples', 'bkp'], ['prg-python-2.zip'])
>>> gen.next( )
('D:\\PP3E\\proposal', [], ['proposal-programming-python-3e.doc'])
>>> gen.next( )
('D:\\PP3E\\dev', ['examples'], ['ch05.doc', 'ch06.doc', 'ch07.doc', 'ch08.doc',
'ch09.doc', 'ch10.doc', 'ch11.doc', 'ch12.doc', 'ch13.doc', 'ch14.doc', ...more...
The os.walk generator has more features than I will demonstrate here.
For instance, additional arguments allow you to specify a top-down or
bottom-up traversal of the directory tree, and the list of
subdirectories in the yielded tuple can be modified in-place to change
the traversal in top-down mode, much as for os.path.walk. See the
Python library manual for more details.
os.walk生成器有許多功能我沒有在此展示。例如,附加參數(shù)允許你指定自上而下還是自下而上遍歷目錄樹,以及在自上而下的模式中,生成的元組中的子目錄列表可以就地修改來更改遍歷,就像os.path.walk中的一樣。詳情請參閱Python庫手冊。
So why the new call? Is the new os.walk easier to use than the
traditional os.path.walk? Perhaps, if you need to distinguish between
subdirectories and files in each directory (os.walk gives us two lists
rather than one) or can make use of a bottom-up traversal or other
features. Otherwise, it's mostly just the trade of a function for a for
loop header. You'll have to judge for yourself whether this is more
natural or not; we'll use both forms in this book.
那么,為什么要有這個新的調(diào)用呢?是新的os.walk比傳統(tǒng)的os.path.walk更好用?如果您需要區(qū)分每個目錄中的子目錄和文件
(os.walk為我們提供了兩個列表,而不是一個),或者想利用自下而上的遍歷或其他功能,也許os.walk是更好用。否則,os.walk幾乎僅僅
是把一個函數(shù)替換為for循環(huán)頭。你必須自己去判斷這是否更自然;在本書中,這兩種形式我們都會使用。
4.3.2.3. Recursive os.listdir traversals
4.3.2.3. 遞歸os.listdir遍歷
The os.path.walk and os.walk tools do tree traversals for us, but it's
sometimes more flexible and hardly any more work to do it ourselves.
The following script recodes the directory listing script with a manual
recursive traversal function (a function that calls itself to repeat
its actions). The mylister function in Example 4-5 is almost the same
as lister in Example 4-4 but calls os.listdir to generate file paths
manually and calls itself recursively to descend into subdirectories.
os.path.walk和os.walk工具可以為我們做樹遍歷,但有時,我們自己遍歷會更靈活,并且?guī)缀鯚o須做太多工作。以下腳本用一個手動遞歸遍歷
函數(shù)重寫了目錄列表腳本(遞歸函數(shù)就是它會調(diào)用自身做重復(fù)的動作)。示例4-5中的mylister函數(shù)與示例4-4的lister幾乎相同,但它調(diào)用
os.listdir來手動產(chǎn)生文件路徑,并遞歸調(diào)用自己進(jìn)入子目錄。
Example 4-5. PP3E\System\Filetools\lister_recur.py
# list files in dir tree by recursion
import sys, os
def mylister(currdir):
print '[' + currdir + ']'
for file in os.listdir(currdir): # list files here
path = os.path.join(currdir, file) # add dir path back
if not os.path.isdir(path):
print path
else:
mylister(path) # recur into subdirs
if _ _name_ _ == '_ _main_ _':
mylister(sys.argv[1]) # dir name in cmdline
This version is packaged as a script too (this is definitely too much
code to type at the interactive prompt); its output is identical when
run as a script:
此版本也被打包為腳本(在交互式提示符下敲代碼,這無疑是太多了);作為腳本運行時,其輸出是相同的:
C:\...\PP3E\System\Filetools>python lister_recur.py C:\Temp
[C:\Temp]
C:\Temp\about-pp.html
C:\Temp\python1.5.tar.gz
C:\Temp\about-pp2e.html
C:\Temp\about-ppr2e.html
[C:\Temp\newdir]
C:\Temp\newdir\temp1
C:\Temp\newdir\temp2
C:\Temp\newdir\temp3
[C:\Temp\newdir\more]
C:\Temp\newdir\more\xxx.txt
C:\Temp\newdir\more\yyy.txt
But this file is just as useful when imported and called elsewhere:
但是該文件可以在其他地方被導(dǎo)入并調(diào)用:
C:\temp>python
>>> from PP3E.System.Filetools.lister_recur import mylister
>>> mylister('.')
[.]
.\about-pp.html
.\python1.5.tar.gz
.\about-pp2e.html
.\about-ppr2e.html
[.\newdir]
.\newdir\temp1
.\newdir\temp2
.\newdir\temp3
[.\newdir\more]
.\newdir\more\xxx.txt
.\newdir\more\yyy.txt
We will make better use of most of this section's techniques in later
examples in Chapter 7 and in this book at large. For example, scripts
for copying and comparing directory trees use the tree-walker
techniques listed previously. Watch for these tools in action along the
way. If you are interested in directory processing, also see the
discussion of Python's old grep module in Chapter 7; it searches files
and can be applied to all files in a directory when combined with the
glob module, but it simply prints results and does not traverse
directory trees by itself.
在本書及后面第7章的例子中,我們將好好地利用本節(jié)的大部分技術(shù)。例如,復(fù)制和比較目錄樹的腳本會使用前面列出的樹遍歷技術(shù)。請一路上注意這些實用工具。
如果你對目錄處理有興趣,也請看看第7章對Python舊的grep模塊的討論;grep會搜索文件,并且與glob模塊組合時,可以應(yīng)用于目錄中的所有
文件,但它本身只是打印結(jié)果,并不遍歷目錄樹。
4.3.3. Rolling Your Own find Module
4.3.3. 打造你自己的find模塊
Another way to go hierarchical is to collect files into a flat list all at once. In the second edition of this book, I included a section on the now-defunct find standard library module, which was used to collect a list of matching filenames in an entire directory tree (much like a Unix find command). Unlike the single-directory tools described earlier, although it returned a flat list, find returned pathnames of matching files nested in subdirectories all the way to the bottom of a tree.
層次遍歷的另一種方法是將文件一次性收集到一個平坦的列表。在本書的第二版,包含了一個現(xiàn)在已作廢的標(biāo)準(zhǔn)庫模塊find,它用來收集整個目錄樹中匹配的文 件名列表(很像UNIX find命令)。與前面描述的單目錄工具不同,雖然find返回一個平坦的列表,但它會返回嵌套在子目錄中的匹配文件的路徑名,一路下去直到樹底。
This module is now gone; the os.walk and os.path.walk tools described earlier are recommended as easier-to-use alternatives. On the other hand, it's not completely clear why the standard find module fell into deprecation; it's a useful tool. In fact, I used it oftenit is nice to be able to grab a simple linear list of matching files in a single function call and step through it in a for loop. The alternatives still seem a bit more code-y and tougher for beginners to digest.
這個模塊現(xiàn)在已經(jīng)不復(fù)存在了;據(jù)建議,前面描述的os.walk和os.path.walk工具是更好用的替代品。另一方面,并不完全清楚為什么標(biāo)準(zhǔn)的 find模塊會遭到廢棄;它是個有用的工具。事實上,我經(jīng)常使用它;能夠在單個函數(shù)調(diào)用中抓取匹配的文件到一個簡單的線性列表,并在for循環(huán)中遍歷它, 這很好。而替代方法對于初學(xué)者來說,似乎仍然有點理解困難。
Not to worry though, because instead of lamenting the loss of a module, I decided to spend 10 minutes whipping up a custom equivalent. In fact, one of the nice things about Python is that it is usually easy to do by hand what a built-in tool does for you; many built-ins are just conveniences. The module in Example 4-6 uses the standard os.path.walk call described earlier to reimplement a find operation for use in Python scripts.
但是不要擔(dān)心,不必哀悼失去的模塊,因為我決定花10分鐘做出一個自定義的等價模塊。事實上,Python的好處之一就是,通常很容易用手工做到內(nèi)置工具 所做的事情;許多內(nèi)置模塊僅僅只是提供了方便。示例4-6中的模塊使用了前面所述的標(biāo)準(zhǔn)os.path.walk調(diào)用,重新實現(xiàn)了可用于Python腳本 的find操作。
Example 4-6. PP3E\PyTools\find.py
#!/usr/bin/python
##############################################################################
# custom version of the now deprecated find module
in the
standard library:
# import as "PyTools.find"; equivalent to the original, but uses os.path.walk,
# has no support for pruning subdirs in the tree, and is instrumented to be
# runnable as a top-level script; uses tuple unpacking in function arguments;
##############################################################################
import fnmatch, os
def find(pattern, startdir=os.curdir):
matches = []
os.path.walk(startdir, findvisitor, (matches, pattern))
matches.sort( )
return matches
def findvisitor((matches, pattern), thisdir, nameshere):
for name in nameshere:
if fnmatch.fnmatch(name, pattern):
fullpath = os.path.join(thisdir, name)
matches.append(fullpath)
if _ _name_ _ == '_ _main_ _':
import sys
namepattern, startdir = sys.argv[1], sys.argv[2]
for name in find(namepattern, startdir): print name
There's not much to this file; but calling its find function provides the same utility as the deprecated find standard module and is noticeably easier than rewriting all of this file's code every time you need to perform a find-type search. Because this file is instrumented to be both a script and a library, it can be run or called.
該文件沒什么東西;但是它的find函數(shù)所提供的功能,與作廢的find標(biāo)準(zhǔn)模塊相同,并且當(dāng)你需要執(zhí)行find類型的搜索時,比起每次重寫該文件的所有代碼,使用它的find函數(shù)明顯更容易。因為此文件既是腳本也是庫,所以既可以運行也可以調(diào)用。
For instance, to process every Python file in the directory tree rooted in the current working directory, I simply run the following command line from a system console window. I'm piping the script's standard output into the more command to page it here, but it can be piped into any processing program that reads its input from the standard input stream:
例如,處理當(dāng)前工作目錄為根的目錄樹下的每個Python文件,我只需在系統(tǒng)控制臺窗口中運行以下命令行。這里我把腳本的標(biāo)準(zhǔn)輸出管道到more命令進(jìn)行分頁,但它也可以管道到任何讀取標(biāo)準(zhǔn)輸入流的處理程序:
python find.py *.py . | more
For more control, run the following sort of Python code from a script or interactive prompt (you can also pass in an explicit start directory if you prefer). In this mode, you can apply any operation to the found files that the Python language provides:
為了實施更多控制,可運行以下這類腳本,在腳本中也行,在交互提示符下也行(如果你喜歡,你也可以傳入一個明確的開始目錄)。在這種模式下,您可以對找到的文件應(yīng)用任何Python語言所提供的操作:
from PP3E.PyTools import find
for name in find.find('*.py'):
...do something with name...
Notice how this avoids the nested loop structure you wind up coding with os.walk and the callback functions you implement for os.path.walk (see the earlier examples), making it seem conceptually simpler. Its only obvious downside is that your script must wait until all matching files have been found and collected; os.walk yields results as it goes, and os.path.walk calls your function along the way.
請注意,這樣做避免了用os.walk編碼時的嵌套循環(huán)結(jié)構(gòu),也避免了為os.path.walk實現(xiàn)的回調(diào)函數(shù)(見前面的例子),概念上更簡單。它唯一 明顯的缺點是,你的腳本必須等待所有匹配的文件被找到和收集; 而os.walk會邊執(zhí)行邊產(chǎn)生結(jié)果,而os.path.walk會沿途調(diào)用你的函數(shù)。
Here's a more concrete example of our find module at work: the following system command line lists all Python files in directory D:\PP3E whose names begin with the letter c or t (it's being run in the same directory as the find.py file). Notice that find returns full directory paths that begin with the start directory specification.
下面是我們的find模塊更具體的應(yīng)用例子:以下系統(tǒng)命令行列出目錄D:\PP3E下的所有Python文件,其文件名以字母c或t開始(它運行于find.py文件所在目錄)。請注意,find返回完整的目錄路徑,會以指定的開始目錄開頭。
C:\Python24>python find.py [ct]*.py D:\PP3E
D:\PP3E\NewExamples\calculator.py
D:\PP3E\NewExamples\clock.py
D:\PP3E\NewExamples\commas.py
D:\PP3E\dev\examples\PP3E\Preview\tkinter001.py
D:\PP3E\dev\examples\PP3E\Preview\tkinter101.py
D:\PP3E\dev\examples\PP3E\Preview\tkinter102.py
And here's some Python code that does the same find but also extracts base names and file sizes for each file found:
以下的一些Python代碼做了同樣的find,但是同時對找到的每個文件提取了基本名字和文件大小:
>>> import os
>>> from find import find
>>> for name in find('[ct]*.py', r'D:\PP3E'):
... print os.path.basename(name), '=>', os.path.getsize(name)
...
calculator.py => 14101
clock.py => 11000
commas.py => 2508
tkinter001.py => 62
tkinter101.py => 235
tkinter102.py => 421
As a more useful example, I use the following simple script to clean out any old output text files located anywhere in the book examples tree. I usually run this script from the example's root directory. I don't really need the full path to the find module in the import here because it is in the same directory as this script itself; if I ever move this script, though, the full path will be required:
下面是個更為有用的例子,我用以下的簡單腳本來清除書中examples目錄樹下,所有舊的輸出文本文件。我通常在示例的根目錄下運行此腳本。在這里的導(dǎo) 入中,我其實并不需要find模塊的完整路徑,因為find模塊和該腳本本身是在同一目錄;但如果我一旦移動這個腳本,就需要完整的路徑:
C:\...\PP3E>type PyTools\cleanoutput.py
import os # delete old output files in tree
from PP3E.PyTools.find import find # only need full path if I'm moved
for filename in find('*.out.txt'):
print filename
if raw_input('View?') == 'y':
print open(filename).read( )
if raw_input('Delete?') == 'y':
os.remove(filename)
C:\temp\examples>python %X%\PyTools\cleanoutput.py
.\Internet\Cgi-Web\Basics\languages.out.txt
View?
Delete?
.\Internet\Cgi-Web\PyErrata\AdminTools\dbaseindexed.out.txt
View?
Delete?y
To achieve such code economy, the custom find module calls os.path.walk to register a function to be called per directory in the tree and simply adds matching filenames to the result list along the way.
為了經(jīng)濟(jì)地完成這樣的代碼,自定義find模塊調(diào)用了os.path.walk來注冊一個函數(shù),樹上的每個目錄都要調(diào)用該函數(shù),而該函數(shù)只是沿途將匹配的文件名添加到結(jié)果列表。
New here, though, is the fnmatch moduleyet another Python standard library module that performs Unix-like pattern matching against filenames. This module supports common operators in name pattern strings: * (to match any number of characters), ? (to match any single character), and [...] and [!...] (to match any character inside the bracket pairs, or not); other characters match themselves.[*] If you haven't already noticed, the standard library is a fairly amazing collection of tools.
不過fnmatch模塊是新的內(nèi)容:它是另一個Python標(biāo)準(zhǔn)庫模塊,對文件名執(zhí)行Unix的模式匹配。該模塊支持名字模式串中的普通操作:*(匹配任 意多個字符)、?(匹配任意單個字符),及[...]和[!...](匹配方括號內(nèi)的任意單個字符,或不匹配);其他字符匹配它們自己[*]。不知您有沒 有注意到,標(biāo)準(zhǔn)庫是個相當(dāng)驚人的工具集合。
[*] Unlike the re module, fnmatch supports only common Unix shell matching operators, not full-blown regular expression patterns; to understand why this matters, see Chapter 18 for more details.
[*] 與re模塊不同的是,fnmatch僅支持普通的Unix shell匹配操作,而不是全面的正則表達(dá)式模式;想要理解有什么區(qū)別,請詳見第18章。
Incidentally, find.find is also roughly equivalent to platform-specific shell commands such as find -print on Unix and Linux, and dir /B /S on DOS and Windows. Since we can usually run such shell commands in a Python script with os.popen, the following does the same work as find.find but is inherently nonportable and must start up a separate program along the way:
順便說一句,find.find也與Unix和Linux上的find -print、DOS和Windows上的dir /B /S這些平臺專用的shell命令大致等效。由于我們通常可以在Python腳本中用os.popen運行這樣的shell命令,以下代碼做了與 find.find相同的工作,但其本質(zhì)上是不可移植的,并且必須沿途啟動獨立的程序:
>>> import os
>>> for line in os.popen('dir /B /S').readlines( ): print line,
...
C:\temp\about-pp.html
C:\temp\about-pp2e.html
C:\temp\about-ppr2e.html
C:\temp\newdir
C:\temp\newdir\temp1
C:\temp\newdir\temp2
C:\temp\newdir\more
C:\temp\newdir\more\xxx.txt
The equivalent Python metaphors, however, work unchanged across platformsone of the implicit benefits of writing system utilities in Python:
但是等效的Python隱喻卻可以不加修改地跨平臺運行:這就是用Python編寫系統(tǒng)工具隱含的好處之一:
C:\...> python find.py * .
>>> from find import find
>>> for name in find(pattern='*', startdir='.'): print name
Finally, if you come across older Python code that fails because there is no standard library find to be found, simply change find-module imports in the source code to, say:
最后,如果您遇到較老的Python代碼因為找不到標(biāo)準(zhǔn)庫find而失敗,只需簡單地將源碼中的find模塊導(dǎo)入語句改為:
from PP3E.PyTools import find
rather than:
代替:
import find
The former form will find the custom find module in the book's example package directory tree. And if you are willing to add the PP3E\PyTools directory to your PYTHONPATH setting, all original import find statements will continue to work unchanged.
前者的形式會找到自定義的find模塊,它位于本書的example包目錄樹。如果您愿意將PP3E\PyTools目錄加入到您的PYTHONPATH設(shè)置中,則原來所有的import find語句可以保持不變。
Better still, do nothing at allmost find-based examples in this book automatically pick the alternative by catching import exceptions just in case they are run on a more modern Python and their top-level files aren't located in the PyTools directory:
更好的是什么也不做:本書大多數(shù)基于find的例子會自動選擇替代方法,如果它們運行于一個更現(xiàn)代的Python,并且它們的頂層文件不在PyTools目錄中,它們會捕獲導(dǎo)入異常,從而作出選擇:
try:
import find
except ImportError:
from PP3E.PyTools import find
The find module may be gone, but it need not be forgotten.
find模塊可以消失,但它不應(yīng)該被忘記。
Python Versus csh
Python與csh
If you are familiar with other common shell script languages, it might be useful to see how Python compares. Here is a simple script in a Unix shell language called csh that mails all the files in the current working directory with a suffix of .py (i.e., all Python source files) to a hopefully fictitious address:
如果你熟悉其他常見的shell腳本語言,看看它們與Python的比較可能是有益的。這里是個簡單腳本,是用被稱為csh的Unix shell語言寫的,它會將當(dāng)前工作目錄中的所有以.py為后綴的文件(即所有的Python源文件),郵寄到一個地址,希望該地址不是真的:
#!/bin/csh
foreach x (*.py)
echo $x
mail eric@halfabee.com -s $x < $xend
The equivalent Python script looks similar:
等效的Python腳本類似于:
#!/usr/bin/python
import os, glob
for x in glob.glob('*.py'):
print x
os.system('mail eric@halfabee.com -s %s < %s' % (x, x))
but is slightly more verbose. Since Python, unlike csh, isn't meant just for shell scripts, system interfaces must be imported and called explicitly. And since Python isn't just a string-processing language, character strings must be enclosed in quotes, as in C.
但稍微冗長。因為Python與csh不同,它不只是用于shell腳本,其系統(tǒng)接口必須顯式地導(dǎo)入并調(diào)用。而且由于Python不僅僅是個字符串處理語言,字符串必須放在引號內(nèi),就像C語言。
Although this can add a few extra keystrokes in simple scripts like this, being a general-purpose language makes Python a better tool once we leave the realm of trivial programs. We could, for example, extend the preceding script to do things like transfer files by FTP, pop up a GUI message selector and status bar, fetch messages from an SQL database, and employ COM objects on Windows, all using standard Python tools.
雖然這會在簡單腳本中增加這樣一些額外的按鍵,但是,一旦我們離開簡單程序的領(lǐng)域,Python作為一個通用的語言,將成為一個更好的工具。例如,我們可 以使用標(biāo)準(zhǔn)的Python工具,來擴(kuò)展前面的腳本,讓它做些像通過FTP傳文件、彈出一個GUI消息選擇器和狀態(tài)欄、從SQL數(shù)據(jù)庫獲取信息,和使用 Windows的COM對象這樣的事情。
Python scripts also tend to be more portable to other platforms than csh. For instance, if we used the Python SMTP interface to send mail instead of relying on a Unix command-line mail tool, the script would run on any machine with Python and an Internet link (as we'll see in Chapter 14, SMTP only requires sockets). And like C, we don't need $ to evaluate variables; what else would you expect in a free language?
比起csh,Python腳本也更容易移植到其他平臺。例如,如果我們使用了Python的SMTP接口發(fā)送郵件,而不是依賴于Unix命令行工具 mail,腳本將可運行于任何帶Python和Internet連接的機(jī)器上(在第14章我們將看到SMTP只需要套接口)。就像C語言,我們不需要 用$對變量求值;對于一個免費的語言,您還有什么其他期望呢?