Evolution of the x86 context switch in Linux (2018)(maizure.org) |
Evolution of the x86 context switch in Linux (2018)(maizure.org) |
In looking at the OS dev wiki I see the following:
>"The TSS is primarily suited for hardware multitasking, where each individual process has its own TSS. In Software multitasking, one or two TSS's are also generally used, as they allow for entering Ring 0 code after an interrupt."
Would you not be able to enter Ring0 after an interrupt with a TSS entry? Is this why it is still required?
Was curious if it guards against some C pre-processor issues.
/** include/asm-i386/system.h */
#define switch_to(tsk) do {
[...]
} while (0)For details, see http://c-faq.com/cpp/multistmt.html:
if (foo)
MULTI_LINE_MACRO;
Breaks without some wrapper like if (1) { A; B; } or do { A; B; } while (0): if (foo)
A;
B; // oops, unconditional (e.g., "goto fail")Later, with Red Hat's 4g4g kernels that Linus rejected, the problem would go away for people who installed Red Hat's version of the OS on systems with many gigabytes of memory.
What were the 4g4g kernels? Might you have any literature and/or on those?
If you want to have lots of fun, you could look at switch_mm() on a modern kernel :)
Thanks to the author, for caring about the paper people. : )
What does seem to mitigate Meltdown on some CPUs is enabling segment limits for user code. This does nothing for 64-bit code, though.
edit: not to mention that there are no hardware context switches on 64-bit kernels. AMD removed support entirely in 64-bit mode. The TSS still exists, but it’s just an awkward dumping ground for a couple of data structures.
Of course, that turns out to be the fix for meltdown, unless you have the process-context identifiers (PCID) available on Haswell chips and newer. The meltdown fix for older CPUs, such as the Pentium III and Intel Core, is roughly the same as the 4g4g kernel changes.
BTW, the 4g4g kernels were created for a different reason. The kernel needed more virtual address space for itself, and thus couldn't share with user code. This was for a time when people were trying to run 32-bit kernels on systems with 32 gigabytes of RAM.
Using TSS based switching is incompatible with PCIDs? Or is it incompatible with separate address spaces for user space and kernel space?
PCIDs are process ID tags on cache lines correct?
PCIDs are incompatible with older hardware. They are modestly slow. I think the PCID state includes the TLB.
That pretty much means the kernel must support both methods. The PCIDs is used when possible. When the hardware doesn't support PCIDs, Linux must instead reload segment registers and the page table base, either step-by-step in software or via a TSS switch.
BTW, I had to implement x86 hardware task switching for an x86 emulator. The complexity is insane. See my "Who is Hiring?" post if that sounds fun for you.
Some segment registers are reloaded on a 64-bit system. That includes CS, DS, SS, and GS. The 32-bit systems must additionally reload ES. All segment registers are loaded for Linux 2.0 and older, via hardware task switching.
CR3 does not get written for system calls when running on a normal Linux from version 2.2 until the meltdown workaround hit.