Evolution of the x86 context switch in Linux (2018)

bogomipz 7 years ago |

Can someone say why is using the TSS still mandatory with software-based task switching? Is this a requirement imposed by x86?

In looking at the OS dev wiki I see the following:

>"The TSS is primarily suited for hardware multitasking, where each individual process has its own TSS. In Software multitasking, one or two TSS's are also generally used, as they allow for entering Ring 0 code after an interrupt."

Would you not be able to enter Ring0 after an interrupt with a TSS entry? Is this why it is still required?

monocasa 7 years ago | |

The interrupt stack pointer comes from the TSS. Without that you're still running on the untrusted user stack with no way of bootstrapping a kernel context without corrupting the user state.

bogomipz 7 years ago | | |

Oh right, without a mapping its a "chicken and egg" situation. Cheers.

dekhn 7 years ago |

I recall reading a paper comparing Linux and Solaris context switch times in ~98 and Linux was 10-100X faster. Solaris did something incredibly slow and safe.

shereadsthenews 7 years ago | |

Real context switches on Solaris were very slow which is why they had LWP. But Linux process context switches were also faster than Sun's LWP switches.

filereaper 7 years ago |

Enjoyed this article, anybody know the significance of adding the do..while(0) loop within the macro starting Linux 1.3?

Was curious if it guards against some C pre-processor issues.

  /** include/asm-i386/system.h */
  #define switch_to(tsk) do {
    [...]
  } while (0)

akuma73 7 years ago | |

Good answer here: https://stackoverflow.com/questions/257418/do-while-0-what-i...

berti 7 years ago | | |

Just to add more context, this is a very common cpp (c pre-processor) idiom. You'll find it in most non-trivial C projects somewhere.

pantalaimon 7 years ago | | |

Abbreviating the C Preprocessor as cpp is very confusing imho.

monocasa 7 years ago | | |

It's a common abbreviation older than C++. You used to even be able to run the c pre processor on arbitrary non-C files by using the cpp command.

berti 7 years ago | | |

You still can.

saagarjha 7 years ago | | |

Tell that to CPPFLAGS.

Someone 7 years ago | |

It enables you to invoke the macro as if it were an expression statement consisting of a function call, regardless of where it appears.

For details, see http://c-faq.com/cpp/multistmt.html:

loeg 7 years ago | | |

In particular, a single statement. I'm sure the link covers it, but:

    if (foo)
      MULTI_LINE_MACRO;

Breaks without some wrapper like if (1) { A; B; } or do { A; B; } while (0):

    if (foo)
      A;
      B;  // oops, unconditional (e.g., "goto fail")

monocasa 7 years ago | | |

It also gives you a nice scope to keep local variables in, but there are other ways to accomplish that too.

loeg 7 years ago | | |

Yeah, that's a good point too.

souprock 7 years ago |

An addition to "Linux 2.2 (1999)" is: introduced meltdown vulnerability. That was the then-unknown cost of software context switching.

Later, with Red Hat's 4g4g kernels that Linus rejected, the problem would go away for people who installed Red Hat's version of the OS on systems with many gigabytes of memory.

bogomipz 7 years ago | |

Can you elaborate? How does relying on pure TSS for context switching prevent meltdown?

What were the 4g4g kernels? Might you have any literature and/or on those?

DSingularity 7 years ago | | |

Separate address space for kernel and user. Hardware will use TSS to switch address space as needed for syscalls.

amluto 7 years ago | | |

Huh? Hardware TSS has nothing to do with separating address spaces. It’s just a trick for switching the address space, and it’s really quite slow on modern CPUs. In theory, a kernel could use hardware task switching to switch address spaces on user/kernel transitions by forcing a hardware task switch when this happens, but the performance impacts would be considerably worse than KPTI.

What does seem to mitigate Meltdown on some CPUs is enabling segment limits for user code. This does nothing for 64-bit code, though.

edit: not to mention that there are no hardware context switches on 64-bit kernels. AMD removed support entirely in 64-bit mode. The TSS still exists, but it’s just an awkward dumping ground for a couple of data structures.

bogomipz 7 years ago | | |

Do you the reason why Linus rejected this idea?

souprock 7 years ago | | |

The main reason seemed to be the relatively bad performance of the hardware task switch (loading segment registers and the page table base) that would be required for any system call.

Of course, that turns out to be the fix for meltdown, unless you have the process-context identifiers (PCID) available on Haswell chips and newer. The meltdown fix for older CPUs, such as the Pentium III and Intel Core, is roughly the same as the 4g4g kernel changes.

BTW, the 4g4g kernels were created for a different reason. The kernel needed more virtual address space for itself, and thus couldn't share with user code. This was for a time when people were trying to run 32-bit kernels on systems with 32 gigabytes of RAM.

bogomipz 7 years ago | | |

>"Of course, that turns out to be the fix for meltdown, unless you have the process-context identifiers (PCID) available on Haswell chips and newer."

Using TSS based switching is incompatible with PCIDs? Or is it incompatible with separate address spaces for user space and kernel space?

PCIDs are process ID tags on cache lines correct?

amluto 7 years ago |

Nice article!

If you want to have lots of fun, you could look at switch_mm() on a modern kernel :)

Upvoter33 7 years ago |

This is really well done, bravo.

glonq 7 years ago |

Very thorough. Nice job.

MichaelMoser123 7 years ago |

anyone knows what Ingo Molnar is doing these days?

moosingin3space 7 years ago | |

A ton of stuff related to eBPF, last I saw.

heinrichhartman 7 years ago |

I really like how good the article looks when printed. I enjoy reading long, in-depth articles much more when I can read them in print. Unfortunately many blog posts need a lot of tweaking until I can get an acceptable print result. This one looks good enough without any trickery.

Thanks to the author, for caring about the paper people. : )

Groxx 7 years ago | |

out of curiosity, since it changes how the page prints and I haven't experimented much: have you tried printing while in the browser's "reading" mode? Or does that tend to be worse?

ezconnect 7 years ago | | |

Not OP,but if it looks good on "reading" mode it prints nicely. Plus you can adjust font size and print width to conform to your taste.