Difference between logical and physical cpu hotplug

Tue Aug 23 02:10:30 EDT 2011

On Sun, Aug 21, 2011 at 1:09 PM, Srivatsa Bhat <bhat.srivatsa at gmail.com>wrote:

>
>
> On Sat, Aug 20, 2011 at 4:05 AM, Vaibhav Jain <vjoss197 at gmail.com> wrote:
>
>>
>> On Thu, Aug 18, 2011 at 11:14 AM, Srivatsa Bhat <bhat.srivatsa at gmail.com>wrote:
>>
>>>
>>>
>>>   On Thu, Aug 18, 2011 at 11:40 PM, Srivatsa Bhat <
>>> bhat.srivatsa at gmail.com> wrote:
>>>
>>>>
>>>>
>>>>    On Thu, Aug 18, 2011 at 10:44 PM, Vaibhav Jain <vjoss197 at gmail.com>wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 18, 2011 at 9:02 AM, srivatsa bhat <
>>>>> bhat.srivatsa at gmail.com> wrote:
>>>>>
>>>>>> Hi Vaibhav,
>>>>>>
>>>>>>  On Thu, Aug 18, 2011 at 8:24 PM, Vaibhav Jain <vjoss197 at gmail.com>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I talked to a friend of mine and he suggested that
>>>>>>> in a logical offline state the cpu is powered on and ready to execute
>>>>>>> instructions
>>>>>>> just that the kernel is not aware of it. But in case of physical
>>>>>>> offline state the cpu
>>>>>>> is powered off and cannot run.
>>>>>>> Are you saying something similar ?
>>>>>>>
>>>>>>> Yes, you are right, mostly.
>>>>>> When you try to logically offline a CPU, the kernel will do task
>>>>>> migration (i.e., move out all the tasks running on that CPU to other CPUs in
>>>>>> the system) and it ensures that it doesn't need that CPU anymore. This also
>>>>>> means that, from now on, the context of that CPU need not be saved (because
>>>>>> the kernel has moved that CPU's tasks elsewhere). At this point, it is as if
>>>>>> the kernel is purposefully using only a subset of the available CPUs. This
>>>>>> step is a necessary prerequisite to do physical CPU offline later on.
>>>>>>
>>>>>> But I don't think CPU power ON or OFF is the differentiating factor
>>>>>> between logical and physical offlining. In logical offline, you still have
>>>>>> the CPUs in the system but you just tell the kernel not to use them. At this
>>>>>> stage, you can power off your CPU, to save power for example.
>>>>>> But in physical offline, from a software perspective, you do
>>>>>> additional work at the firmware level (apart from logical offlining at the
>>>>>> OS level), to ensure that physically plugging out the CPUs will not affect
>>>>>> the running system in any way.
>>>>>>
>>>>>> Please note that you can logically online and offline the same CPUs
>>>>>> over and over again without rebooting the system. Here, while onlining a CPU
>>>>>> which was offlined previously, the kernel follows almost the same sequence
>>>>>> which it normally follows while booting the CPUs during full system booting.
>>>>>>
>>>>>> Also one more thing to be noted is that, to be able to physically
>>>>>> hot-plug CPUs, apart from OS and firmware support, you also need the
>>>>>> hardware to support this feature. That is, the electrical wiring to the
>>>>>> individual CPUs should be such that plugging them in and out does not
>>>>>> interfere with the functioning of the rest of the system. As of today, there
>>>>>> are only a few systems that support physical CPU-hotplug. But you can do
>>>>>> logical CPU hotplug easily, by configuring the kernel appropriately during
>>>>>> compilation, as you have noted in one of your previous mails.
>>>>>>
>>>>>> Regards,
>>>>>> Srivatsa S. Bhat
>>>>>>
>>>>>
>>>>>
>>>>> Hi Srivatsa,
>>>>>
>>>>> That was great explanation! Thanks!
>>>>> I have just one more query. You mentioned above that " the kernel
>>>>> follows almost the same *sequence *which it normally follows while
>>>>> booting the CPUs during full system booting."
>>>>>
>>>>> Can you please explain this sequence a little ?
>>>>>
>>>>>
>>>> Hi Vaibhav,
>>>>
>>>> I'll try to outline a very high level view of what happens while booting
>>>> an SMP (Symmetric Multi-Processor) system. Instead of going through the
>>>> entire boot sequence, let me just highlight only the part that is of
>>>> interest in this discussion: booting multiple CPUs.
>>>>
>>>> The "boot processor" is the one which is booted first while booting a
>>>> system. On x86 architecture, CPU 0 is always the boot processor. Hence, if
>>>> you have observed, you cannot offline CPU0 using CPU hot-plugging on an x86
>>>> machine. (On an Intel box, the file /sys/devices/system/cpu/cpu0/online is
>>>> purposefully absent, for this reason!). But in other architectures, this
>>>> might not be the case. For example on POWER architecture, any processor in
>>>> the system can act as the boot processor.
>>>>
>>>> Once the boot processor does its initialization, the other processors,
>>>> known as "secondary processors or application processors (APs)" are
>>>> booted/initialized. Here, obviously some synchronization mechanism is
>>>> necessary to ensure that this order is followed. So in Linux, we use 2
>>>> bitmasks called "cpu_callout_mask" and "cpu_callin_mask". These bitmasks are
>>>> used to indicate the processors available in the system.
>>>>
>>>> Once the boot processor initializes itself, it updates cpu_callout_mask
>>>> to indicate which secondary processor (or application processor AP) can
>>>> initialize itself next (for example, the boot processor sets a particular
>>>> bit as 1 in the cpu_callout_mask). On the other hand, the secondary
>>>> processor would have done some very basic initialization till then and will
>>>> be testing the value of 'cpu_callout_mask' in a while loop to see if its
>>>> number has been "called out" by the boot processor. Only after the boot
>>>> processor "calls out" this AP, this AP will continue the rest of its
>>>> initialization and completes it.
>>>>
>>>> Once the AP completes its initialization, it reports back to the boot
>>>> processor by setting its number in the cpu_callin_mask. As expected, the
>>>> boot processor would have been waiting in a while loop on cpu_callin_mask to
>>>> see if this AP booted OK or not. Once it finds that the cpu_callin_mask for
>>>> this AP has been set, the boot processor follows the same procedure to boot
>>>> other APs: i.e., it updates cpu_callout_mask and waits for the corresponding
>>>> entry to be set in cpu_callin_mask by that AP and so on. This process
>>>> continues until all the APs are booted up.
>>>>
>>>> Of course, each of these "waiting" times (of both boot processor and
>>>> APs) are capped by some preset value, say for example 5 seconds. If some AP
>>>> takes more than that time to boot, the boot processor declares that the AP
>>>> could not boot and takes appropriate action (like clearing its bit in
>>>> cpu_callout_mask and logically removing that AP from its tables etc,
>>>> effectively forgetting about that processor). Similarly while the APs wait
>>>> for the boot processor to call them out, if the boot processor does not call
>>>> them within a given time period, they declare kernel panic.
>>>>
>>>> Here are some references, if you are interested in more details:
>>>>
>>>> Linux kernel source code:
>>>> 1. linux/arch/x86/kernel/smpboot.c : start_secondary() and smp_callin()
>>>>     These are the functions executed by the APs (secondary or
>>>> application processors). Actually smp_callin() is called within
>>>> start_secondary() which is the primary function executed by APs.
>>>>
>>>> 2. linux/arch/x86/kernel/smpboot.c :  do_boot_cpu()
>>>>
>>>        This is executed by the boot processor.  You can look up other
>>> important functions such as native_cpu_up().
>>>
>>>     General SMP booting info:
>>>     1. http://www.cheesecake.org/sac/smp.html
>>>
>>> [ Sorry, I accidentally sent the earlier mail before composing the text
>>> fully. ]
>>>
>>> Regards,
>>> Srivatsa S. Bhat
>>>
>>
>>
>>
>>  Awesome explanation Srivatsa!! Thanks a lot!!
>> Just had one more doubt. I am a little unclear about how the APs get
>> initialized in the beginning. In the case of Boot Processor
>> its just like a uniprocessor system. But how do the APs start executing
>> code ?
>> Could you please explain a little ?
>>
>>
> Sure. But please note that I will stick to Intel architecture while
> explaining the details.
>
> The Boot CPU or the Boot-Strap Processor (BSP) is the one which boots the
> Operating System. Then it wakes up the APs (Application Processors) when it
> is the right time.
>
> Let us now explore some background details to understand how all this
> works.
> On uniprocessor systems we use PIC (Programmable Interrupt Controller) like
> the 8259A Interrupt Controller chip to deliver interrupts to the processor.
> On Multi-Processor (MP) systems, we use something known as APICs (Advanced
> Programmable Interrupt Controllers). Every processor has a local APIC.
> And there are one or more I/O APICs in the system that are shared by all
> the processors. As the name suggests, I/O APICs are used to deliver
> interrupts from I/O devices to the processors, via the local APICs.
>
> All local APICs have unique IDs that are assigned either by the hardware or
> the BIOS during the initialization phase. Using the local APIC ID we can
> identify the processors in the system.
>
> Using these local APICs, we can send something known as "Inter-Processor
> Interrupts" or IPIs. As the name suggests, this is a mechanism for one
> processor to interrupt another processor in the system. Note that this
> mechanism can be used by any processor to talk to any other processor in
> the system (no distinction between BSP and APs here).
>
> To kick-start the APs, the BSP sends INIT IPI to each AP in turn, waits for
> some time for the IPI to be delivered to the AP and then checks if that AP
> booted up. Based on the version of the APIC used, the BSP might have to send
> 2 STARTUP IPIs to the APs with some time delay after each of the IPIs.
> [ If you have discrete APICs (i.e., 82489DX APIC) then INIT IPI will do. If
> you have integrated APIC, you need to send two STARTUP IPIs. ]
> All this is in accordance with the "Universal Start-up Algorithm" to start
> APs, as specified by Intel architecture. These IPIs cause an INIT at the AP
> to which it was delivered.
>
> Now the question is, how do you make the APs to execute a particular piece
> of code (i.e., jump to a specified location) on start-up?
> We know that whenever a processor starts after a RESET or INIT, it starts
> executing code from the reset vector (a predefined location).
> However if you want a processor to immediately jump to an address that you
> have specified, you must use the INIT IPI as part of a "warm-reset".
> Warm-reset allows you to send INIT signal to a processor without causing
> the processor to go through the entire BIOS initialization (POST -- see
> below for details) and then start the processor's execution at the
> warm-reset-vector.
>
> By putting the appropriate pointer (i.e., pointer to the AP start-up code)
> in the warm-reset-vector (system RAM location 40:67h), setting the BIOS
> shutdown code to 0Ah (which tells the BIOS that this INIT is part of a
> warm-reset) and then causing an INIT at the processor (via the IPIs), the
> Operating System can cause the processor to jump immediately to any location
> and start executing that code. This is how the BSP can boot the APs and make
> them execute some particular piece of code (in this case, the AP start-up
> code as designed in the OS).
>
> It would be worthwhile to understand what would be the state of the system
> (and the APs) before the Operating System gets control from the BIOS after
> switching ON the machine. The BIOS, upon system start, performs a procedure
> known as POST (Power-On Self Test). This is to check the status of all the
> components/circuitry of the system, including the processors, to ensure that
> they are all functioning properly. During this phase the BIOS initializes
> all the circuitry (including all the APICs and the processors) to some known
> configuration and then puts all the APs to the HALT state with interrupts
> disabled. This is to ensure that the APs don't execute Operating System code
> (we want only the BSP to execute the OS code initially). Then the BSP starts
> executing OS code.
>
> To boot APs, the BSP sends IPIs to them. But IPIs are non-maskable (note
> that the APs were in HALT state, with interrupts disabled). Hence the BSP
> will be able to kick-start AP execution and by using the warm-reset
> mechanism, it can direct the APs to execute some particular piece of code at
> startup. The BSP would have put a pointer to that AP start-up code in the
> warm-reset-vector address before sending the INIT or STARTUP IPIs to the
> APs.
>
> You might be wondering how does the BSP specify to its local APIC as to
> which AP it must send an IPI to..
> The answer is simple. During BIOS POST, an MP (Multi-Processor)
> Configuration Table will be set up (in conjunction with BSP and APs) in a
> well-known region of memory, which will be read by the OS during boot up.
> This table contains the local APIC IDs of all the processors.
> So, while sending the targeted IPIs using its local APIC, the BSP specifies
> the local APIC ID of the target AP which it wants to interrupt (and boot in
> this case). This ensures the delivery of the IPI to the correct AP.
>
> In short, this is how a Multi-Processor system gets rolling ... :-)
>
> For more details you can refer:
>
> 1. Intel Multi-Processor Specification, especially Appendix A and B.
>     http://www.intel.com/design/pentium/datashts/242016.htm
>
>
> 2. linux/arch/x86/kernel/smpboot.c :
>     do_boot_cpu(), wakeup_secondary_cpu_via_init(), native_cpu_up(),
> start_secondary()
>
> 3. linux/arch/x86/kernel/head_32.S:
>     startup_32_smp()
>
> 4. linux/arch/x86/kernel/trampoline_32.S
>
> 5. http://tldp.org/HOWTO/Linux-i386-Boot-Code-HOWTO/smpboot.html
>
> Regards,
> Srivatsa S. Bhat
>
>

 Srivatsa, you are awsome!  Thanks a lot!!
I am just wondering what all is required to gain this depth of knowledge :)

Thanks
Vaibhav Jain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20110822/6a192568/attachment-0001.html