pid start time and new display field in proc pid stat

Navin P navinp0304 at gmail.com
Thu Mar 25 08:53:03 EDT 2021


Hi,

 As of 5.11 kernel (pid,pid_start_time) is not unique /monotonic even
though the underlying counters are .
 I chose start_boottime because i wanted the counter to increase
during suspend as well.

1. Is there any case where task->start_boottime or
ktime_get_boottime_ns doesn't become monotonic i.e increasing ?

2.  If start_boottime is not monotonic which counter to use ?

3.  If i create a new field in task_struct , then i can use a
atomic_add_return(1,&v) to fill in the task->new_field. Will this also
work ?

By doing this <pid,pid_start_time> becomes unique.

 In linux/fs/proc/array.c at line 566 we have

  /* apply timens offset for boottime and convert nsec -> ticks */
start_time =
nsec_to_clock_t(timens_add_boottime_ns(task->start_boottime));
task->start_boottime is a monotonic increasing counter fetched from
ktime_get_boottime_ns in fork.c

nsec_to_clock_t contains div_u64 due to which we loose some lower
bits/digits  on divison
and is not unique unless the divisor is 1.

 if CONFIG_HZ = 250 and nsec_to_clock_t x=4000001 , then

#if (NSEC_PER_SEC % USER_HZ) == 0
return div_u64(x, NSEC_PER_SEC / USER_HZ);

becomes div_u64(x, 4000000) then  return value is 1
when x=4000002, return value is 1
until x=8000000 which returns 2.

The value shown in /proc/[pid]/stat is actually the truncated value.

Hence i'm planning to display a counter at the end of /proc/[pid]/stat
 as the 53rd field.

I've prepared a patch as inlined.

>From a2c6b5d6435394f015d38700008ff74f16dfa5fd Mon Sep 17 00:00:00 2001
From: Navin P <navinp0304 at gmail.com>
Date: Thu, 25 Mar 2021 15:27:30 +0530
Subject: [PATCH] Display task->start_boottime as 53rd field in
 /proc/[pid]/stat.  The 22nd field start_time currently shown in
 /proc/[pid]/stat as start_time is truncated by division.Hence it is not
 unique .

Signed-off-by: Navin P <navinp0304 at gmail.com>
---
 Documentation/filesystems/proc.rst | 1 +
 fs/proc/array.c                    | 1 +
 2 files changed, 2 insertions(+)

diff --git a/Documentation/filesystems/proc.rst
b/Documentation/filesystems/proc.rst
index 48fbfc336ebf..3b7a1543b2c0 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -381,6 +381,7 @@ It's slow but very precise.
   env_end       address below which program environment is placed
   exit_code     the thread's exit_code in the form reported by the waitpid
  system call
+  start_boottime the process start time in nanoseconds since boot
   ============= ===============================================================

 The /proc/PID/maps file contains the currently mapped memory regions and
diff --git a/fs/proc/array.c b/fs/proc/array.c
index bb87e4d89cd8..74389aaefa9c 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -645,6 +645,7 @@ static int do_task_stat(struct seq_file *m, struct
pid_namespace *ns,
  else
  seq_puts(m, " 0");

+ seq_put_decimal_ull(m, " ", task->start_boottime);
  seq_putc(m, '\n');
  if (mm)
  mmput(mm);
-- 
2.25.1



More information about the Kernelnewbies mailing list