pipe writes, ERESTARTSYS and SA_RESTART

Viacheslav Biriukov v.v.biriukov at gmail.com
Thu Aug 4 14:01:39 EDT 2022


Hello team,

It would be great if someone can help me with a question about blocking
write calls to a pipe and syscall restart logic.

>From my experiments I can see that if the SA_RESTART flag is set, the
kernel (?) restarts the write call if the process gets a signal.
The logic lives in the pipe.c file under the pipe_write function:
https://elixir.bootlin.com/linux/v5.19/source/fs/pipe.c#L555

But what I can't understand is how and where the kernel modifies the
arguments of the write system call and where it collects the return values
of all these restarts, thus the userspace caller ultimately sees the
correct number of written bytes.

With strace I can see all that retries, for example:

write(1, ""..., 33554431)               = 65536
write(1, ""..., 33488895)               = ? ERESTARTSYS (To be restarted if
SA_RESTART is set)
write(1, ""..., 33488895)               = ? ERESTARTSYS (To be restarted if
SA_RESTART is set)
write(1, ""..., 33488895)               = ? ERESTARTSYS (To be restarted if
SA_RESTART is set)
write(1, ""..., 33488895)               = 33488895

Here there were 4 restarts (I sent 4 signals), 3 of them returned
ERESTARTSYS and 2 were able to write to the pipe. Also for restarts strace
shows the correct 3rd argument, which is decrementing.

The caller in the userspace in the end sees that it was able to write
65536+33488895 bytes. Which is correct and what the man 7 pipe describes.

My question is how and where it does that. I tried to dig in the kernel
source code but can't find the place where this tracking occurs.

Thank you for reading this far and for your willingness to help.

Have a great day,
BR,
Viacheslav

-- 
Sent from Gmail Mobile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20220804/e5eac968/attachment.html>


More information about the Kernelnewbies mailing list