結合PE格式對linker分析1
這是以前我學習PE的時候的一些摘要。
分析摘要:
(*1*): 寫程序。a.cpp 和 foo.cpp
其中a.cpp的內容為:
extern void Foo();
void main()
{
Foo();
}
foo.cpp的內容為:
#include "stdio.h"
void Foo()
{
printf("I am Foo!");
}
編譯程序產生a.obj foo.obj a.exe.
(*2*):Copy以上3個文件到..\VisualStdio\VC98\BIN目錄下
用以下命令解析:
dumpbin /all a.obj >aobj.txt <Enter>
dumpbin /all foo.obj >fooobj.txt <Enter>
dumpbin /all a.exe >aexe.txt <Enter>
(*3*):
打開文件a.obj,找到代碼節。內容如下:
SECTION HEADER #3
.text name
0 physical address
0 virtual address
2E size of raw data
355 file pointer to raw data //Attention!!~!~
383 file pointer to relocation table
397 file pointer to line numbers
2 number of relocations
3 number of line numbers
60501020 flags
Code
Communal; sym= _main
16 byte align
Execute Read
RAW DATA #3
00000000: 55 8B EC 83 EC 40 53 56 57 8D 7D C0 B9 10 00 00 U....@SVW.}.....
00000010: 00 B8 CC CC CC CC F3 AB E8 00 00 00 00 5F 5E 5B ............._^[
00000020: 83 C4 40 3B EC E8 00 00 00 00 8B E5 5D C3 ..@;........].
(*4*):反編譯a.obj的代碼節。
打開工具URSoft W32Dasm (我用的是VERSION 8.93)
在打開文件時選擇所有文件,因為本軟件主要是針對PE,LE,NE等文件格式的。所以對于用來
反編譯OBJ文件需指定偏移量。如上Attention!處所示。即為代碼節的文件偏移。
所以在打開OBJ文件的提示對話框中輸入:00000355
Start Disassembly from Offset 00000355 Hex.
無需選中Check For 16 Bit Disassembly .
反編譯之后的代碼節內容如下:
:00000000 55 push ebp
:00000001 8BEC mov ebp, esp
:00000003 83EC40 sub esp, 00000040
:00000006 53 push ebx
:00000007 56 push esi
:00000008 57 push edi
:00000009 8D7DC0 lea edi, dword ptr [ebp-40]
:0000000C B910000000 mov ecx, 00000010
:00000011 B8CCCCCCCC mov eax, CCCCCCCC
:00000016 F3 repz
:00000017 AB stosd
:00000018 E800000000 call 0000001D //Attention!!!
:0000001D 5F pop edi
:0000001E 5E pop esi
:0000001F 5B pop ebx
:00000020 83C440 add esp, 00000040
:00000023 3BEC cmp ebp, esp
:00000025 E800000000 call 0000002A
:0000002A 8BE5 mov esp, ebp
:0000002C 5D pop ebp
:0000002D C3 ret
簡要說明:
The 0xE8 is the CALL instruction opcode. The next DWORD should contain the offset to the Foo function (relative to the CALL instruction). It's pretty clear that Foo probably isn't zero bytes away from the CALL instruction. Simply put, this code wouldn't work as expected if you were to execute it. The code is broken, and needs to be fixed up.
In the above example of a call to function Foo, there will be a REL32 fixup record, and it will have the offset of the DWORD that the linker needs to overwrite with the appropriate value.
(*5*):查看緊跟代碼節后的RELOCATIONS:
RELOCATIONS #3
Symbol Symbol
Offset Type Applied To Index Name
-------- ------- ------------- ------ --------------
00000019 REL32 00000000 12 ?Foo@@YAXXZ (void __cdecl Foo(void))
00000026 REL32 00000000 13 __chkesp
this(first) fixup record says that the linker needs to calculate the relative offset to
function Foo, and write that value to offset four in the section.
(*6*):實際的a.exe代碼節內容:
:00401000 55 push ebp
:00401001 8BEC mov ebp, esp
:00401003 83EC40 sub esp, 00000040
:00401006 53 push ebx
:00401007 56 push esi
:00401008 57 push edi
:00401009 8D7DC0 lea edi, dword ptr [ebp-40]
:0040100C B910000000 mov ecx, 00000010
:00401011 B8CCCCCCCC mov eax, CCCCCCCC
:00401016 F3 repz
:00401017 AB stosd
:00401018 E813000000 call 00401030
:0040101D 5F pop edi
:0040101E 5E pop esi
:0040101F 5B pop ebx
:00401020 83C440 add esp, 00000040
:00401023 3BEC cmp ebp, esp
:00401025 E846000000 call 00401070
:0040102A 8BE5 mov esp, ebp
:0040102C 5D pop ebp
:0040102D C3 ret
:0040102E CC int 03
:0040102F CC int 03
//中間無內容省略。
* Referenced by a CALL at Address:
|:00401018
|
:00401030 55 push ebp
:00401031 8BEC mov ebp, esp
:00401033 83EC40 sub esp, 00000040
:00401036 53 push ebx
:00401037 56 push esi
:00401038 57 push edi
:00401039 8D7DC0 lea edi, dword ptr [ebp-40]
:0040103C B910000000 mov ecx, 00000010
:00401041 B8CCCCCCCC mov eax, CCCCCCCC
:00401046 F3 repz
:00401047 AB stosd
:00401048 68ECC04000 push 0040C0EC
:0040104D E85E000000 call 004010B0
:00401052 83C404 add esp, 00000004
:00401055 5F pop edi
:00401056 5E pop esi
:00401057 5B pop ebx
:00401058 83C440 add esp, 00000040
:0040105B 3BEC cmp ebp, esp
:0040105D E80E000000 call 00401070
:00401062 8BE5 mov esp, ebp
:00401064 5D pop ebp
:00401065 C3 ret
(*7*)看一下FOO.OBJ的內容:(由分析FOOOBJ.TXT中代碼節的偏移為0x000003bf,從而用W32Dasm反編譯。)
:00000000 55 push ebp
:00000001 8BEC mov ebp, esp
:00000003 83EC40 sub esp, 00000040
:00000006 53 push ebx
:00000007 56 push esi
:00000008 57 push edi
:00000009 8D7DC0 lea edi, dword ptr [ebp-40]
:0000000C B910000000 mov ecx, 00000010
:00000011 B8CCCCCCCC mov eax, CCCCCCCC
:00000016 F3 repz
:00000017 AB stosd
:00000018 6800000000 push 00000000
:0000001D E800000000 call 00000022
:00000022 83C404 add esp, 00000004
:00000025 5F pop edi
:00000026 5E pop esi
:00000027 5B pop ebx
:00000028 83C440 add esp, 00000040
:0000002B 3BEC cmp ebp, esp
:0000002D E800000000 call 00000032
:00000032 8BE5 mov esp, ebp
:00000034 5D pop ebp
:00000035 C3 ret
綜上分析可知:連接器在整合各個編譯單元(.obj)時,如上A.OBJ和FOO.OBJ已記錄下需要調整的數據,
比如a.obj中的FOO函數位置,即
:00000018 E800000000 call 0000001D //Attention!!!
RAW DATA #3
00000000: 55 8B EC 83 EC 40 53 56 57 8D 7D C0 B9 10 00 00 U....@SVW.}.....
00000010: 00 B8 CC CC CC CC F3 AB E8 00 00 00 00 5F 5E 5B ............._^[
00000020: 83 C4 40 3B EC E8 00 00 00 00 8B E5 5D C3 ..@;........].
節后緊跟的RELOCATIONS #3
Symbol Symbol
Offset Type Applied To Index Name
-------- ------- ------------- ------ --------------
00000019 REL32 00000000 12 ?Foo@@YAXXZ (void __cdecl Foo(void))
在連接時,連接器整合代碼節,將FOO.OBJ的代碼節接在A.OBJ的代碼節之后。如下:
:00401000 55 push ebp
....
:00401018 E813000000 call 00401030
....
:0040102D C3 ret
:0040102E CC int 03
:0040102F CC int 03
//中間無內容省略。
* Referenced by a CALL at Address:
|:00401018
|
:00401030 55 push ebp
....
:00401065 C3 ret
其中CALL 00401030中的00400000為代碼優先載入基地址。
而E813000000中的13000000即為偏移值。事實上為00000013,這是INTEL CPU的特性
a peculiarity of Intel processors where numerical data is stored in
reverse order to character data.
To copy a 32 bit value (56 A7 00 FE) into the eax register, you will find the opcode, A1 (MOV EAX) followed by (FE 00 A7 56).
A1 FE 00 A7 56
從偏移00401018跳到00401030。如此可以得出指令為:E813000000
手工算法:
因為CALL指令本身占用5個字節(1個為CALL nmemonic(E8),另4個為偏移值)。
而0040101D-00401018=5所以偏移事實上應該從0040101D算起。
故00401030-0040101D=13
所以產生的CALL指令為E813000000
借助軟件:
oPcodeR--由Cool McCool編寫。非常好用。
linker分析2
(*1*):建立DLL工程。在第二步選1。即默認。
//這個dll工程只用來輸出兩個函數。別無他用。
添加文件dll.cpp:
文件內容如下:
#include"stdio.h"
void __declspec(dllexport) ExportOne( void )
{
printf("I am ExportOne!\n");
}
void __declspec(dllexport) ExportTwo( void )
{
printf("I am ExportTwo!\n");
}
編譯運行產生dll.obj dll.dll.
[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]]]]]]
也可這樣建立:
//文件dll.cpp
#include"stdio.h"
//void __declspec(dllexport) ExportOne( void )
void ExportOne(void)
{
printf("I am ExportOne!\n");
}
//void __declspec(dllexport) ExportTwo( void )
void ExportTwo(void)
{
printf("I am ExportTwo!\n");
}
//文件dll.def
; dll.def : Declares the module parameters for the DLL.
LIBRARY "dll"
DESCRIPTION 'dll Windows Dynamic Link Library'
EXPORTS
; Explicit exports can go here
ExportOne @1
ExportTwo @2
[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]]]]]]
(*2*):建立LIB工程。
//這個LIB工程只用來測試引入剛才DLL輸出的兩個函數。
添加文件lib.cpp
文件內容如下:
#include"stdio.h"
void ExportOne(void);
void ExportTwo(void);
void main()
{
ExportOne();
ExportTwo();
}
編譯運行產生lib.obj lib.exe.
(*3*)LIB.OBJ分析
(*4*)反編譯LIB.OBJ.注意代碼節的文件偏移為00000392
:00000000 55 push ebp
......
:00000018 E800000000 call 0000001D //這里就是ExportOne()調用
:0000001D E800000000 call 00000022 //這里就是ExportTwo()調用
......
:00000032 C3 ret
(*5*)LIB.EXE分析:
:00401000 55 push ebp
......
:00401017 AB stosd
* Reference To: dll.ExportOne, Ord:0001h
|
:00401018 E81D000000 Call 0040103A
* Reference To: dll.ExportTwo, Ord:0002h
|
:0040101D E812000000 Call 00401034
......
:00401032 C3 ret
:00401033 CC int 03
* Referenced by a CALL at Address:
|:0040101D
|
* Reference To: dll.ExportTwo, Ord:0002h
|
:00401034 FF25C8C04000 Jmp dword ptr [0040C0C8]
* Referenced by a CALL at Address:
|:00401018
|
* Reference To: dll.ExportOne, Ord:0001h
|
:0040103A FF25C4C04000 Jmp dword ptr [0040C0C4]
(*6*)引入函數與非引入函數的區別。
從上我們可以看出,其實不管是不是引入函數,編譯器產生的函數調用代碼都是CALL XXXXXXXX形式的。
//from dll.lib
Archive member name at 8: /
3E951F55 time/date Thu Apr 10 15:37:57 2003
...
correct header end
7 public symbols
1FE __IMPORT_DESCRIPTOR_dll
4F8 __NULL_IMPORT_DESCRIPTOR
62C dll_NULL_THUNK_DATA
778 ?ExportOne@@YAXXZ
778 __imp_?ExportOne@@YAXXZ
7E2 ?ExportTwo@@YAXXZ
7E2 __imp_?ExportTwo@@YAXXZ
我們可以看到,在LIB文件中有引入函數的信息。
函數符號比如?ExportOne@@YAXXZ能夠被解析。并且LIB文件中有很多關于引入函數的信息。比如:
Summary
BA .debug$S
14 .idata$2
14 .idata$3
4 .idata$4
4 .idata$5
8 .idata$6
所有的.idata節最終會被合并到可執行文件的.IDATA節中。從而形成IAT和其他有關引入表的結構。
SECTION HEADER #2
.idata$5 name
...
C0300040 flags
...
RAW DATA #2
00000000: 00 00 00 00
如果函數是通過序號引入的。那么在.idata$5節的DWORD的最高位為1。低位是引入(出)序號。
否則.idata$5節的DWORD為0。
如果函數是通過名字引入的。那么在.idata$6節的第一個WORD為引入(出)序號。接下去是一個函數名字。
**通過LIB文件,函數被決議為一個JMP DWORD PTR[XXXXXXXX]形式的指令。
通常稱為STUB。當然LIB文件中也有引入函數的真正地址。
010 00000000 SECT3 notype () External | ?ExportOne@@YAXXZ (void __cdecl ExportOne(void))
//以下為函數ExportOne的代碼。
SECTION HEADER #3
.text name
...
RAW DATA #3
00000000: 55 8B EC 83 EC 40 53 56 57 8D 7D C0 B9 10 00 00 U....@SVW.}.....
00000010: 00 B8 CC CC CC CC F3 AB 68 00 00 00 00 E8 00 00 ........h.......
00000020: 00 00 83 C4 04 5F 5E 5B 83 C4 40 3B EC E8 00 00 ....._^[..@;....
00000030: 00 00 8B E5 5D C3 ....].
綜上所述,對引入函數。產生的代碼大致形式如下:
CALL XXXXXXXX
...
XXXXXXXX:
JMP DWORD PTR[YYYYYYYY]
YYYYYYYY地址在引入節部分。
最后調到引入函數的地址去執行。