posts - 71, comments - 41, trackbacks - 0

How to "Build" a Computer 6-What's an Instruction?

Background

You should know what UB and 2C representation is. You should also know about sign-extension.

ISA

When you started learning how to program, you were told that your program had to be compiled. That is, it had to be converted from a high-level language into a low-level language. For C and C++, the low-level language is basically machine code.

An ISA defines the machine and assembly code used by a CPU.

ISA stands for "Instruction Set Architecture". Effectively, the ISA is the programmer's view of the computer.

An ISA consists of:

The instruction set This is the set of instructions supported. This is the part that's usually called the assembly language.
The register set This is the set of registers you can use. (There are other hidden registers which you can't use directly. They are used indirectly, however).
The address space This is the set of memory addresses that can be used by your program.

The ISA is basically a hardware specification. It's the view of the hardware as seen by an assembly language programmer.

The ISP (instruction set processor) is an implementation of the ISA. There may be many implementations for a given ISA. For example, IA32 is the instruction set architecture for x86 processors. Intel has the Pentium and Celeron lines of CPUs that implement this ISA. AMD also has its own CPUs that implement the ISA. Each implementation is different, but they all run code written in IA32.

Why You Need to Know About Instructions

We study instruction sets because that's what CPUs process. They run one instruction after another. In order to understand how a computer works, you need to know what instructions are, and more importantly, how to write them.

There are two ways to write instructions. Either you can write them in assembly language, which is human-readable. Or you can write them in machine code, which is basically, 0's and 1's. CPUs process machine code, but humans usually program in assembly language.

You need to know both, in order to understand how a CPU works.

The MIPS ISA

There are two ways to write instructions. You can write it in assembly language, which is human readable, or you can write it in machine code, which is 0's and 1's. For MIPS32, each machine code instruction is a 32-bit bitstring.

The "32" in MIPS32 refers to the size of the registers (i.e., how many bits each register holds) and to the number of bits used in an address. There is also a MIPS64, which has 64 bit addresses and 64 bit registers.

The MIPS32 architecture contains 32 general purpose int registers. The registers are named $r0, $r1, ..., $r31. Each register can store 32 bits. Most of the times the registers either store signed or unsigned ints. However, sometimes they store addresses, and occasionally ASCII characters, etc.

MIPS also has 32 floating point registers, but we won't worry about them too much.

Unlike programming languages where you can declare as many variables as you want, you can't create any more registers. The number of registers doesn't change.

MIPS32 allows you to access data in memory using 32 bit addresses. In principle, you can access up to 2³² different addresses, using 32 bits. In practice, some of those addresses may be invalid. For example, the CPU may simply not have that much memory (2³² addresses is 4 GB). Thus, you might be able to generate the 32-bit address, but there may be nothing stored at that address (an error usually occurs when you access an invalid address).

In MIPS, nearly all registers are general purpose. You can classify ISAs into those that use general purpose registers (i.e., instructions can refer to any register---all registers perform the same operations) or special purpose (certain instructions can only be used on specific, i.e., not all, registers).

However, there is at least one exception. $r0 is not general purpose. It is hardwired to 0. No matter what you do to this register, it always has a value of 0. You might wonder why such a register is needed in MIPS.

The designers of MIPS used benchmarks (programs used to determine the performance of a CPU), which convinced them that having a register hardwired to 0 would improve the performance (speed) of the CPU as opposed to not having it. Not everyone agrees a register hardwired to 0 is essential, so not all ISAs have a zero register.

Assembly vs. Machine Code

CPUs process binary bitstrings. These bitstrings are really instructions, encoded in 0's and 1's. When people began to write programs for computers, they wrote it in binary. That is, programs were written in 0's and 1's. The code probably looked something like this:

0000 0000 0101 1000 0000 0000 0101 1000
1010 1101 0000 1011 1000 1100 1001 0110

This is called machine code.

As you might imagine, machine code was difficult to read and difficult to debug. The amount of time wasted trying to find whether you had accidentally written a 0 instead of a 1, lead to the invention of assembly language.

Assembly language is a somewhat more human-readable version of machine code. For example, assembly code might look something like:

add   $r2, $r3, $r4
addi  $r2, $r3, -10

While you may not understand the code above, you've certainly got a much better chance of figuring it out than the machine code equivalent. Each line of assembly code contains an instruction. Each instruction tells the computer one small task to accomplish. Instructions are the building blocks of programs.

CPUs can't handle assembly code directly. Instead, assembly code is translated to machine code. If this sounds like compiling, that's because it basically is comiling. However, people usually call the process of translating assembly to machine code assembling, instead of compiling.

You'll write code in assembly, and learn how to translate some instructions from assembly to machine code. It's very important that you understand the machine code, because that's what the CPU processes. Furthemore, by studying machine code, you get to see how information is encoded into 0's and 1's, and you get to see how the CPU uses these binary values to execute the instruction in hardware.

Encoding Registers

In the previous set of notes, we talked about how many bits you needed to create N different labels. We assume each label has k bits long.

You need k = ceil( lg N ) bits to uniquely label N items.

MIPS32 has 32 integer registers. We want to label each register by a number, so instructions can refer to registers by number. Since MIPS has 32 registers, you need ceil( lg 32 ) = 5 bits.

If we think of the 5 bit numbers as unsigned binary numbers, then the registers are numbered from 0 up to 31, inclusive. In fact, that's exactly how MIPS numbers its registers. Registers are numbered from $r0 up to $r31. The binary equivalent are numbered from 00000 to 11111.

In assembly language, you'd write $r6. In machine code, you'd write the same register as: 00110. In assembly language, you'd write $r30. In machine code, you'd write 11110.

This is important because we're going to use register encoding in the machine language instructions for MIPS. Recall that machine code is a 32-bit bitstring. When we refer to registers within the instruction, it's going to be using the 5 bit binary numbers written in UB (unsigned binary).

What is an instruction?

An assembly language instruction is basically a function call. Like C functions, assembly language instructions have a fixed number of arguments. You can't add or remove the number of arguments.

Like C functions, arguments of assembly language instructions have type. Or at least, something that resembles type. Basically, there are 4 kinds of "types" for MIPS.

Registers ($r0, $r1,..., $r31)
Immediates Constants, such as, 10, -20, etc. Sometimes written in hexadecimal, e.g., 0x3a.
Register Offset This is a constant and a register, written as -10($r3) or 214($r4). That is, you write the immediate (constant) value, then a left parenthesis, then a register, then a right parenthesis.
The computation is performed by adding the contents of the register to the offset, usually resulting in a 32 bit address. Thus, -10($r3) is -10 added to the contents of register 3. This result is "temporary" and register 3 is not modified (just like x + y in a programming language merely adds x to y, but the sum does not change x or y
Labels There are identifiers to locations in memory. Generally, you write labels in uppercase letters and underscores, such as FOR_LOOP.

For the most part, we'll only consider registers and immediate values.

Let's consider two examples of instructions and their operands:

add $r2, $r3, $r4 This instruction adds the contents of register 3 and register 4, and places the result in register. It's basically R[2] = R[3] + R[4], if you pretend that the registers form an array.
addi $r2, $r3, -10 This instruction adds the contents of register 3 to -10 and places the result in register 2. It's basically R[2] = R[3] - 10

The first instruction is an add instruction. add requires exactly 3 operands (arguments). Each operand must be a valid register. The operand can not be anything besides a register. In particular, you can not create expressions such as:

# WRONG! Operands can't be expressions
add $r2, $r3, (add $r4, $r5, $r6)

The second instruction is an addi instruction. addi must also have three operands. The first two operands must be registers, while the third one must be an integer between -2¹⁵ to 2¹⁵ - 1, inclusive.

There is a reason for this restriction in value, which we will discuss momentarily.

Unlike higher level programming languages, you can't create new registers. You're forced to use the ones available. You can't create new instructions either. You must use the ones provided in the instruction set.

Machine Code

A machine language instruction ususally consists of:

opcode This is a binary representation of the instruction. For example, an add instruction has an opcode of 000 000.
operands Operands means the same thing as arguments. It's older terminology usually associated with assembly/machine code instructions.

MIPS divides instructions into three formats. Instructions are either R-type (register type), I-type (immediate type), or J-type (jump type). The types refer to the format, not to its purpose. (For example, branch instructions are I-type, because of its format, even if it would seem like it should be J-type).

Here are the layouts of the three kinds of instructions.

R-type Instruction

Opcode	Register s	Register t	Register d	Shift Amt	Function
B_31..26	B_25..21	B_20..16	B_15..11	B_11..6	B_5..0
`ooo ooo`	`sssss`	`ttttt`	`ddddd`	`aaaaa`	`ffffff`

R-type instructions are short for "register type" instructions.
Bits B_31..26 are used for the opcode. For R-type instructions, the opcode is almost always 000 000. Normally, this makes no sense, because every instruction should have a unique opcode. However, bits B_5..0 (the function part) uses 6 bits to specify the instruction. Only R-type instruction uses a function.
Bits B_25..21 specify a 5-bit UB encoding for the first source register.
Bits B_20..16 specify a 5-bit UB encoding for the second source register.
Bits B_15..11 specify a 5-bit UB encoding for the destination register. This specifies which register stores the result of the operation.
Bits B_11..6 specify the shift amount. This is usually 00000, except for shift instructions.
Bits B_5..0 specify a 6-bit function. Each R-type instruction has a unique 6 bit value. For example, add has a 6-bit value that's different from sub. add and sub are two different instructions.

I-type Instruction

Opcode	Register s	Register t	Immediate
B_31..26	B_25..21	B_20..16	B_15..0
`ooo ooo`	`sssss`	`ttttt`	`iiii iiii iiii iiii`

I-type instructions are short for "immediate type" instructions.
Bits B_31..26 are used for the opcode. Unlike R-type instructions, the 6-bit value is NOT 000 000. There is no function code for I-type instructions.
Bits B_25..21 specify a 5-bit UB encoding for the source register.
Bits B_20..16 specify a 5-bit UB encoding for the destination register. Although this is called register t, instead of register d, it is treated as the destination register for I-type instructions.
Bits B_15..0 is the 16-bit immediate value. This may be a 16-bit UB number or a 16-bit 2C number. Notice that the immediate value is encoded directly into the instruction.

J-type Instruction

Opcode	Target
B_31..26	B_25..0
`ooo ooo`	`tt tttt tttt tttt tttt tttt tttt`

J-type instructions are short for "jump type" instructions.
Bits B_31..26 are used for the opcode. Unlike R-type instructions, the 6-bit value is NOT 000 000. There is no function code for J-type instructions.
Bits B_25..0 are used for the offset. This is usually used to generate an address.

Notice that the J-type instruction has no source or destination registers.

add, an R-type instruction

The general format for an add instruction is: add $rd, $rs, $rt
$rd, $rs, and $rt are not real registers. They are merely place holders. For example, if we write add $r2, $r3, $r4, then for this particular example, $rd = $r2, $rs = $r3, and $rt = $r4.

In assembly language, the instructions are written with the destination register (i.e. register d), then the first source register, (i.e. register s) then the second source register (i.e. register t).

Note: This is NOT the same order as it is written in machine code. In assembly, it's destination, source 1, source 2. In MIPS machine code, it's written source 1, source 2, destination.

Don't ask me why the MIPS folks did it that way. They just did.

Let's translate the following instruction into MIPS assembly.

add $r2, $r3, $r4

For add, the opcode is 000 000. The function code is 100 000. Since the shift amount isn't used, it's set to 00000.

We encode $r2 as 00010, $r3 as 00011, and $r4 as 00100.

This is how the machine code equivalent looks:

Opcode	Register s	Register t	Register d	Shift Amt	Function
B_31..26	B_25..21	B_20..16	B_15..11	B_11..6	B_5..0
?	$r3	$r4	$r2	?	?
`000 000`	`00011`	`00100`	`00010`	`00000`	`100 000`

Again, notice that bits B_25..21 is source 1 (i.e., $r3), then B_20..16 is source 2 (i.e., $r4), then B_15..11 is the destination register (i.e., $r2).

It's important that you learn how to translate a few instructions, because the CPU manipulates the binary version of this, not the assembly version. In particular, pay attention to how the registers are encoded, and just as importantly, which bits refer to which registers.

addi, an I-type instruction

addi stands for add immediate. It's an I-type instruction.

The general format for an addi instruction is:

addi $rt, $rs, IMMED

For I-type instructions, $rt is the destination register (not $rs). $rs is still the first source register. For addi, the immediate value is written in base 10 (or possibly, hexadecimal), but it eventually gets translated to 2C.

Let's look at a specific example.

addi $r3, $r10, -3

This instruction adds the contents of register 10, to the value -3, and stores the result in register 3.

The opcode for addi is 001 000. In 2C, you write -3_ten as 1111 1111 1111 1101.

This is how the instruction is encoded.

Opcode	Register s	Register t	Immediate
B_31..26	B_25..21	B_20..16	B_15..0
?	$r10	$r3	-10, represented in 2C
`001 000`	`01010`	`00011`	`1111 1111 1111 1101`

Again, notice that in the assembly code $r3 (i.e., the destination register) appears first, while in the machine code $r3 appears second. Also, notice that the immediate value is written in 16 bits, two's complement.

Now that you see why it's written in 16 bits, 2C, you see why the immediate value can only be between -2^-15 through 2¹⁵ - 1. This is the range of valid values for 16 bit 2C.

The assembler must translate base 10 representation to 2C representation when translating addi from assembly to machine code.

Some instructions encode the immediate in 2C, while other instructions encode it in UB.

Summary

This section on instructions is not trying to teach you how to program in MIPS assembly. Instead, it's to briefly introduce you to what an instruction is, and how it is encoded.

While it's useful to know how to program in MIPS assembly, it's isn't essential to understand how a CPU works. To understand how a CPU works, at least, initially, all you need to know is what an instruction looks like in binary, and what that individual instruction is supposed to do.

posted on 2007-01-23 15:42 Charles 閱讀(446) 評論(0) 編輯收藏引用所屬分類: 拿來主義

只有注冊用戶登錄后才能發表評論。
【推薦】100%開源！大型工業跨平臺軟件C++源碼提供，建模，組態！

相關文章: HP中國區的總裁的退休感言zz COFF格式續篇—Lib文件的結構zz COFF文件的格式zz 微軟C/C++ 編譯器選項zz C++各大有名庫的介紹z 圖說歷史：400年來的計算機編年史！ How to "Build" a Computer 11-What's a Register File? Virtual Bookshelf: What's on Your Shelf? How to "Build" a Computer 10-What's a Multiplexer? How to "Build" a Computer 9-What's a Combinational Logic Device?

網站導航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

2007年1月

日

一

二

三

四

五

六

決定開始寫工作日記，記錄一下自己的軌跡...

常用鏈接

留言簿(4)

隨筆分類(70)

隨筆檔案(71)

charles推薦訪問

Code Project
Linux Journal
Linux man pages
Single UNIX Specification
電子書1
電子書2
電子書3