[RFC]confusion about syscall

Peter Teoh htmldeveloper at gmail.com
Sun Jul 15 11:24:45 EDT 2012


just sharing my analysis, correct me if wrong:

On Sun, Jul 15, 2012 at 8:36 PM, 王哲 <wangzhe5004 at gmail.com> wrote:

>
>
> 2012/7/15 Peter Teoh <htmldeveloper at gmail.com>
>
>> Hi Mulyadi and WangZhe,
>>
>> Nice to write to you again....:-).
>>
>> On Sun, Jul 15, 2012 at 1:49 PM, Mulyadi Santosa <
>> mulyadi.santosa at gmail.com> wrote:
>>
>>> Hi...
>>>
>>> On Sun, Jul 15, 2012 at 9:28 AM, 王哲 <wangzhe5004 at gmail.com> wrote:
>>> > and the second program:
>>> >
>>> > #include <stdio.h>
>>> > #include <unistd.h>
>>> >
>>> > int main(void)
>>> > {
>>> >     unsigned long value = 0;
>>> >     value = getpid();
>>> >     return 0;
>>> > }
>>> >
>>> > and disassembling it:( objdump -d a.out)
>>> > ...
>>> > 08048300 <getpid at plt>:
>>> >  8048300:    ff 25 00 a0 04 08        jmp    *0x804a000
>>> >  8048306:    68 00 00 00 00           push   $0x0
>>> >  804830b:    e9 e0 ff ff ff           jmp    80482f0 <_init+0x3c>
>>>
>>> Looks like jumping into vsyscall page to me...
>>>
>>>
>> after I start the process, and doing a gdb -p <pid>:
>>
>> (gdb) disassemble main
>> Dump of assembler code for function main:
>>    0x0000000000400564 <+0>: push   %rbp
>>    0x0000000000400565 <+1>: mov    %rsp,%rbp
>>    0x0000000000400568 <+4>: sub    $0x10,%rsp
>>    0x000000000040056c <+8>: movq   $0x0,-0x8(%rbp)
>>    0x0000000000400574 <+16>: mov    $0x0,%eax
>>    0x0000000000400579 <+21>: callq  0x400460 <getpid at plt>
>>    0x000000000040057e <+26>: cltq
>>    0x0000000000400580 <+28>: mov    %rax,-0x8(%rbp)
>>    0x0000000000400584 <+32>: movabs $0x9184e72a000,%rdi
>>    0x000000000040058e <+42>: mov    $0x0,%eax
>>    0x0000000000400593 <+47>: callq  0x400470 <sleep at plt>
>>    0x0000000000400598 <+52>: mov    $0x0,%eax
>>    0x000000000040059d <+57>: leaveq
>>    0x000000000040059e <+58>: retq
>> End of assembler dump.
>> (gdb) disassemble getpid
>> Dump of assembler code for function getpid:
>>    0x00007f19ae558530 <+0>: mov    %fs:0x2d4,%edx
>>    0x00007f19ae558538 <+8>: cmp    $0x0,%edx
>>    0x00007f19ae55853b <+11>: jle    0x7f19ae558540 <getpid+16>
>>    0x00007f19ae55853d <+13>: mov    %edx,%eax
>>    0x00007f19ae55853f <+15>: retq
>>    0x00007f19ae558540 <+16>: jne    0x7f19ae558554 <getpid+36>
>>    0x00007f19ae558542 <+18>: mov    %fs:0x2d0,%eax
>>    0x00007f19ae55854a <+26>: test   %eax,%eax
>>    0x00007f19ae55854c <+28>: nopl   0x0(%rax)
>>    0x00007f19ae558550 <+32>: je     0x7f19ae558554 <getpid+36>
>>    0x00007f19ae558552 <+34>: repz retq
>>    0x00007f19ae558554 <+36>: mov    $0x27,%eax
>>    0x00007f19ae558559 <+41>: syscall
>>    0x00007f19ae55855b <+43>: test   %edx,%edx
>>    0x7f19ae55855d <getpid+45>: jne    0x7f19ae558552 <getpid+34>
>>    0x7f19ae55855f <getpid+47>: mov    %eax,%fs:0x2d0
>>    0x7f19ae558567 <getpid+55>: retq
>>
>>
>    Hi peter:
>        question1: why your system is "0x00007f19ae558554 <+36>: mov
>  $0x27,%eax",
> getpid syscall  number is 0x14
>
> yes u are right - for 32-bit kernel:

In arch/x86/kernel>
grep getpid *.S
syscall_table_32.S: .long sys_getpid /* 20 */

but my linux kernel is 64-bit.



>        question2: i use gdb disassemble getpid just like you and the
> result:
>
>
>     (gdb) disassemble getpid
>  Dump of assembler code for function getpid:
>    0xb7771a40 <+0>:    mov    %gs:0x6c,%edx
>    0xb7771a47 <+7>:    cmp    $0x0,%edx
>    0xb7771a4a <+10>:    jle    0xb7771a50 <getpid+16>
>    0xb7771a4c <+12>:    mov    %edx,%eax
>    0xb7771a4e <+14>:    repz ret
>    0xb7771a50 <+16>:    jne    0xb7771a62 <getpid+34>
>    0xb7771a52 <+18>:    mov    %gs:0x68,%eax
>    0xb7771a58 <+24>:    test   %eax,%eax
>    0xb7771a5a <+26>:    lea    0x0(%esi),%esi
>    0xb7771a60 <+32>:    jne    0xb7771a4e <getpid+14>
>    0xb7771a62 <+34>:    mov    $0x14,%eax
>    0xb7771a67 <+39>:    call   *%gs:0x10
>
>

See the comment for gs in entry_32.S:

/*
 * User gs save/restore
 *
 * %gs is used for userland TLS and kernel only uses it for stack
 * canary which is required to be at %gs:20 by gcc.  Read the comment
 * at the top of stackprotector.h for more info.
 *
 * Local labels 98 and 99 are used.
 */
#ifdef CONFIG_X86_32_LAZY_GS

And inside stackprotector.h, content of which is still beyond my completely
understanding at the moment, I copied it here:

/*
 * GCC stack protector support.
 *
 * Stack protector works by putting predefined pattern at the start of
 * the stack frame and verifying that it hasn't been overwritten when
 * returning from the function.  The pattern is called stack canary
 * and unfortunately gcc requires it to be at a fixed offset from %gs.
 * On x86_64, the offset is 40 bytes and on x86_32 20 bytes.  x86_64
 * and x86_32 use segment registers differently and thus handles this
 * requirement differently.
 *
 * On x86_64, %gs is shared by percpu area and stack canary.  All
 * percpu symbols are zero based and %gs points to the base of percpu
 * area.  The first occupant of the percpu area is always
 * irq_stack_union which contains stack_canary at offset 40.  Userland
 * %gs is always saved and restored on kernel entry and exit using
 * swapgs, so stack protector doesn't add any complexity there.
 *
 * On x86_32, it's slightly more complicated.  As in x86_64, %gs is
 * used for userland TLS.  Unfortunately, some processors are much
 * slower at loading segment registers with different value when
 * entering and leaving the kernel, so the kernel uses %fs for percpu
 * area and manages %gs lazily so that %gs is switched only when
 * necessary, usually during task switch.
 *
 * As gcc requires the stack canary at %gs:20, %gs can't be managed
 * lazily if stack protector is enabled, so the kernel saves and
 * restores userland %gs on kernel entry and exit.  This behavior is
* controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in
 * system.h to hide the details.
 */

Yes, gs register is valid for userspace TLS and thus is per-process, and
for more info:

http://www.akkadia.org/drepper/tls.pdf

http://www.ibm.com/developerworks/linux/library/l-user-space-apps/index.html

http://stackoverflow.com/questions/6021273/how-to-allocate-thread-local-storage

(and lots of relevant links besides it).



  can you explain the meaning of "call   *%gs:0x10"?
>
>   Thanks!
>
>
>
>
>> And to check the address space:
>>
>> (gdb) info sharedlibrary
>> From                To                  Syms Read   Shared Object Library
>> 0x00007f19ae4cb8c0  0x00007f19ae5dec60  Yes (*)     /lib/libc.so.6
>> 0x00007f19ae830af0  0x00007f19ae849704  Yes (*)
>> /lib64/ld-linux-x86-64.so.2
>> (*): Shared library is missing debugging information.
>>
>>
>> and if u want:
>>
>> cat /proc/2282/maps
>>
>> 7f19ae82a000-7f19ae82b000 rw-p 0017d000 08:05 9922
>> /lib/libc-2.11.1.so
>> 7f19ae830000-7f19ae850000 r-xp 00000000 08:05 8824
>> /lib/ld-2.11.1.so
>> 7ffff2031000-7ffff2052000 rw-p 00000000 00:00 0
>>  [stack]
>> 7ffff21af000-7ffff21b0000 r-xp 00000000 00:00 0
>>  [vdso]
>> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
>>  [vsyscall]
>>
>> noticed also that static analysis tools like "objdump -d" is generally
>> avoided, if u want to understand dynamic addresses.   From above, we can
>> conclude that the "sysenter" (this is intel syntax, or "syscall", in AMD
>> syntax as used by gdb disassembly above) is used for the transition to the
>> kernel - as embedded inside the libc.so.6.
>>
>>
>>> --
>>> regards,
>>>
>>> Mulyadi Santosa
>>> Freelance Linux trainer and consultant
>>>
>>> blog: the-hydra.blogspot.com
>>> training: mulyaditraining.blogspot.com
>>>
>>> _______________________________________________
>>> Kernelnewbies mailing list
>>> Kernelnewbies at kernelnewbies.org
>>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>>
>>
>>
>>
>> --
>> Regards,
>> Peter Teoh
>>
>
>


-- 
Regards,
Peter Teoh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20120715/0c437dec/attachment-0001.html 


More information about the Kernelnewbies mailing list