How Does an Intel Processor Boot?(binarydebt.wordpress.com) |
How Does an Intel Processor Boot?(binarydebt.wordpress.com) |
"16-bit Real Mode with insruction pointer pointing to address 0xffff.fff0, the reset vector. In this initial mode, the processor has first 12 address lines asserted, so any address looks like 0xfffx.xxxx. This fact combined with how addressing using segment selector (SS) register works, allows the CPU to access instruction at reset vector address 0xffff.fff0"
SS is the Stack Segment register. CS is for code.
In real mode, the instruction pointer is only 16-bit and can not hold a value in excess of 0xffff.
Ignoring those issues, the explanation still doesn't match what I've seen before in the documentation. If things have changed, when did that happen? The old explanation:
The CS base is set to something like -16, that is with all but the lower 4 bits set. This covers all of physical address space, with any higher bits just being ignored. The instruction pointer is set to 0. The result is execution that starts 16 bytes below the first address that is beyond the end of the physical address space. For example, with 44-bit physical addresses this would be at 0x00000ffffffffff0.
regarding how the CPU addresses 0xffff.fff0 is not exactly specified in the post. actually CS register is loaded with 0xf000 and normally this would yield a segment selector address of 0x000f.0000 (CS left-shifted by 4 bits). but on a reset, like the post mentions, first 12 address lines are asserted so the base address ends up being 0xffff.0000. these address lines remain asserted until a long jump is made, after which the first 12 address lines are de-asserted and normal CS segment selector calculation resumes.
instruction pointer contains -16 as you mentioned, the resulting address is:
base address + IP = 0xffff.0000 + 0xfff0 = 0xffff.fff0
i am not sure if this is worth adding to the post but it is definitely useful.
So at reset, CS is set to a descriptor whose numeric value is 0xf000 and whose base address is 0xffff0000, or something to that effect. All the rest follows naturally -- there's no special case logic that asserts lines of the address bus until the first long jump, it's simply that the reset value of the CS descriptor is rather magical, and that long jumps by their nature load a new CS segment descriptor which isn't magical.
Unless something has changed in recent hardware, there aren't 12 address lines just asserted. This is a side effect of the CS base being a particular value.
An important thing to realize is that x86 has hidden registers associated with segments. These registers get set when a segment selector register is loaded, not when it is used. The CS base is one of these hidden registers. If CS is loaded in protected mode, the base comes out of the descriptor table, and it remains when switching back to real mode. (this is the "unreal mode") If CS is loaded in real mode, the base comes from the selector shifted left, and this base remains even if you switch to protected mode. Switching modes doesn't change a segment base. Loading segment registers is what changes a segment base.
So initially, the CS base is not set in a way that matches what you would get if you loaded the CS selector value that is seen. It is set to a value that is possibly 0xfffffff0, 0x00000ffffffffff0, 0x0000fffffffffff0, or 0xfffffffffffffff0. The older documentation I've seen would use the largest of those values. I suppose it could then be cut down to 32-bit by the bottleneck that is normally a part of addressing when not in long mode. This is the sort of area where Intel, AMD, and others may differ.
Perhaps there is a hardware debugger for x86 (like a JTAG debugger) that would show the initial CS base. One could also guess that Simics or VMware might be correct, disassembling them to find out what they use. Another idea is to examine the badly-documented state used by the virtualization instructions.
And yes, one has to be careful about outdated information.
* https://superuser.com/a/347115/38062
* https://superuser.com/a/695716/38062
AFAIK this is a completely new behaviour and only for newer versions of ME; the older versions still boot the main CPU like a 386 did, and the ME processor is a separate thing (Don't quote me on this; just information I gathered from brief research.)
This included how the system handed of control to the OS. BIOS just loaded the first disk sector and executed whatever it found there. MBR-based partition tables were a DOS-convention, the BIOS couldn't care less what the first disk sector did once it was in control.
When UEFI, the new boot interface, was invented, we needed a name for the old boot conventions. So we called them BIOS.
It's hard to claim the article confuses anything if the original word BIOS has such a confused meaning to begin with. If you say BIOS=PC firmware except the option ROMs, the article is correct.
And the name is even more confusing than you paint it to be. The "BIOS" was also the bottom-half of MS/PC/DR-DOS, contained in IO.SYS in (pre version 6) MS-DOS and in IBMBIO.COM in PC-DOS and (post version 3) DR-DOS.
Or 64-bit mode, on most current systems.
having said that it is perhaps worth clarifying in the article that classic BIOS would hand off in 16-bit mode :)
Uboot is responsible for this in some chips.
But I guess this might not be the only fundamental difference.
(I have thought to utilize the SRAM for some optimization later at system runtime because it should be incredibly fast. At least if you don't have to care about power consumption that should be possible or does the specification require to turn it off?)
My understanding has always been that initializing DRAM consisted of two things:
The BIOs had to enumerate how much physical memory that motherboard had installed. And then to test that that memory is working by writing a bit to each location and reading it back.
Would this be accurate?
Would also be worth noting that many BIOSes allow you to hit the space bar to skip memory initialization presumably because it somewhat time-consuming.
Most of the boot process is just swinging from one legacy mode to the next until you hit the modern parts.
it is 0xfff0, at least according to Intel Software Developer's Manual Volume 3, section 9.1.4 "First Instruction Executed". regarding 12 address lines being asserted, that is just a way of thinking about it. actual implementation might be different but what happens on reset is akin to 12 most significant bits being set. CS is 0xf000.
indeed a debugger would give the right answer.
This is what I've figured out from Intel's docs:
8086/88: CS:IP = FFFF:0000 first instruction at FFFF0
80186/188: CS:IP = FFFF:0000 first instruction at FFFF0
80286: CS:IP = F000:FFF0 first instruction at FFFF0
80386: CS:IP = 0000:0000FFF0 or F000:0000FFF0[1], first instruction at FFFFFFF0
80486+: CS:IP = F000:0000FFF0(?) first instruction at FFFFFFF0
[1] Depending on which datasheet/programmer's reference manual you read. I can't find any reference to someone who actually checked what the hardware did, however.More interesting reading...
http://www.rcollins.org/Productivity/DescriptorCache.html
When the old boot protocol needed a name, it got BIOS. This situation arose as reaction to the existence of the new UEFI boot protocol.
AFAIK, IO.sys/IBMBIO.COM was the interface used by DOS to the hardware (some kinde of HAL). Microsoft seemed to think every PC-clone vendor would implement its own firmware, and they would have to port IO.SYS for every platform. Happily, after I thinc Compaq reverse engineered the IBM BIOS, this turned out to be unnecesary: IO.sys was only ever implememented for BIOS.
And this despite the change of tack from from naming the firmware to naming the boot mechanism. The name used for the boot mechanism, also adopted long before EFI existed, was a "boot record" or "boot sector", subdivided by type into volume boot records and master boot records, terminology that goes back to the 1980s. The boot protocol was not actually named "BIOS" at all.
IO.SYS/IBMBIO.COM was the Basic Input/Output System, nomenclature (alongside the names of the other parts of the operating system: the BDOS, the command processor, and the housekeeping utilities) that MS-DOS got from CP/M. It was one of two things called that, the other being the machine firmware.
Neither was in any way influenced by something that did not exist until decades later. And although there was confusion between the two, it was not some generalized confusion about parts of the system in general. "BIOS" was not a name for a boot sector/record, even though one of the things contained within a boot record was a BIOS Parameter Block, which people had to regularly explain meant "the other thing that is called a BIOS". And people regularly distinguished in the 1980s between such things as "BIOS services" and the "ROM BIOS".
https://www.pcjs.org/pubs/pc/reference/ibm/5150/techref/
First external PDF, pg 169 first talks about the ROM resident Basic I/O System (BIOS). From that point on, the text refers to this as BIOS, and says that a complete listing is provided in appendix A -- which has nothing concerning to IO.SYS. So it seems IBM considered only the ROM based part BIOS, without the IO.SYS part on the diskette. Even ROM BIOS seems to be a misnomer according to this document. But, as the BIOS sits in ROM and provides services, it is easy to see people talking about ROM BIOS and BIOS services.
As the 5150 has no hard disk, we need the 5160 manual for the HD boot protocol: pg 417 and 419 of https://www.pcjs.org/pubs/pc/reference/ibm/5160/techref/. It reads the first sector from the HD, and checks if the last 2 bytes have a specific value. If yes, it executes whatever it found there.
This de facto defines the MBR as that first sector, without ever calling it MBR. No partitions or Volume boot records exist at the BIOS level, these were purely DOS conventions. See
The Bios Parameter Block was a DOS structure in specific file systems like FAT16. It told DOS how to translate BIOS disk layout (e.g. Cylinder/Head/Sector) to DOS disk layout. See https://en.wikipedia.org/wiki/BIOS_parameter_block and note how details vary with DOS versions. Despite its name, the BIOS did not know or care about this structure.
UEFI is a new firmware standard for the PC, intended to replace the BIOS. At this point, the BIOS was used mainly for booting the OS and communicating PC parameters to the OS. Applications did not directly talk to the BIOS anymore - this was a habit from the DOS era. Seeing UEFI and BIOS today as only a boot mechanism is therefore not a 'change of tack'.
I found no references calling IO.SYS part of the BIOS. Maybe this was a CP/M convention?
Of course, the word BIOS did not time travel to after the release of UEFI. But if you need to describe the non-UEFI way with 1 word, what are you going to do? Call it "that stuff the BIOS did when it wanted to load an OS" is a bit long, so boot mode={UEFI,BIOS} seems reasonable to me. This nomenclature is regularly used in the references provided by https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_In...
See 386 datasheet, page 20:
https://media.digikey.com/pdf/Data%20Sheets/Intel%20PDFs/Int...
The 8086/8088 is slightly different since it doesn't have protected mode; initial CS:IP is FFFF:0000 which gives a first address of FFFF0. The 286 is closer to the 386+ but its 24-bit address space means the first instruction comes from FFFFF0 instead.
https://software.intel.com/en-us/articles/intel-sdm#nine-vol...
Get volume 3A and read chapter 9.1.4 at pg 315. The text is quite readable:
The address FFFFFFF0H is beyond the 1-MByte addressable range of the processor while in real-address mode. The
processor is initialized to this starting address as follows. The CS register has two parts: the visible segment
selector part and the hidden base address part. In real-address mode, the base address is normally formed by
shifting the 16-bit segment selector value 4 bits to the left to produce a 20-bit base address. However, during a
hardware reset, the segment selector in the CS register is loaded with F000H and the base address is loaded with
FFFF0000H. The starting address is thus formed by adding the base address to the value in the EIP register (that
is, FFFF0000 + FFF0H = FFFFFFF0H).
Any change to CS reverts this to normal real mode operation. So near jumps are OK, far jumps or interrupts are not.> The address FFFFFFF0H is beyond the 1-MByte addressable range of the processor while in real-address mode. The processor is initialized to this starting address as follows. The CS register has two parts: the visible segment selector part and the hidden base address part. In real-address mode, the base address is normally formed by shifting the 16-bit segment selector value 4 bits to the left to produce a 20-bit base address. However, during a hardware reset, the segment selector in the CS register is loaded with F000H and the base address is loaded with FFFF0000H. The starting address is thus formed by adding the base address to the value in the EIP register (that is, FFFF0000 + FFF0H = FFFFFFF0H).
(Don't use indentation to format a block quote, only use it for code listings.)
Or at least that's the source of confusion for me, maybe the terminology is different at this level.
Others have mentioned it is.
The reason why 'asserted' is used is that signals at this level are basically analog. The circuit that asserts a signal is, fairly literally, being assertive, and there are all sorts of commonly used options: Pull-ups and pull-downs, either one in either weak or strong (assertive) form.
Connecting a strong pull-down to a strong pull-up represents a short circuit, but having one circuit assert a logical 1 while the other circuit on the same pin holds a weak pull-down (presumably, in this case, 0), is a pretty common configuration.
The most important thing to keep in mind, working with electronics, is that all pins must be connected to at least a weak pull-up/down, which can be as simple as an MOhm-class resistor connected to ground.
If they aren't, then the gate is floating -- and a floating CMOS gate can easily reach states where the gate itself is short-circuiting, since they're made from a transistor pair connected to both ground and power. (As is necessary to support both pull-up and pull-down.) If that doesn't destroy the gate -- check your datasheet -- then, at a minimum, it'll still waste power.
The majority of common microcontrollers (e.g. Arduinos) will allow you to configure the gate with a internal weak pull-up/down, to let you avoid connecting every single pin, but you shouldn't assume that it's configured that way out of the reset vector. Nor that such an internal pull-up even exists.