pid start time and new display field in proc pid stat [ race in task->start_boottime ,start_time]

Navin P navinp0304 at gmail.com
Thu Apr 1 10:50:04 EDT 2021


On Fri, Mar 26, 2021 at 8:31 AM Navin P <navinp0304 at gmail.com> wrote:
>
> On Thu, Mar 25, 2021 at 8:51 PM <jim.cromie at gmail.com> wrote:
> >
> >
> >
> > On Thu, Mar 25, 2021 at 6:53 AM Navin P <navinp0304 at gmail.com> wrote:
> >>
> >> Hi,
> >>
> >>  As of 5.11 kernel (pid,pid_start_time) is not unique /monotonic even
> >> though the underlying counters are .
> >>  I chose start_boottime because i wanted the counter to increase
> >> during suspend as well.
> >>
> My patch doesn't create a new field, it uses the existing field and
> prints it in /proc/[pid]/stat. Stat files are created
> on first lookup. These are used by ps or top.
>
> + seq_put_decimal_ull(m, " ", task->start_boottime);

After printing the task->start_boottime as 53rd field, i found
multiple process containing same value due to a race. The race exists
for p->start_time as well.
Working on the above  i found a race in the kernel/fork.c at
p->start_boottime = ktime_get_boottime_ns();  i logged the output and
compared them to find identical values for many process.

I made a patch and this fixes the issue but i'm not happy with the
change because it locks the tasklist_lock which is a big lock.

What are the other ways  that can be used to fix this problem ?
I was thinking of changing start_boottime as struct start_boottime {
u64 boottime, rw_lock };
Now all my access to boottime go through the rw_lock . Another rwlock
has to be created for start_time as well.


>From a048ac81b468ec94cb10ca41416cfe9641be3c5b Mon Sep 17 00:00:00 2001
From: Navin P <navinp0304 at gmail.com>
Date: Thu, 1 Apr 2021 20:04:24 +0530
Subject: [PATCH] Wrap task->start_boottime in write_lock_irq during creation
 and filling leader->task_boottime in fs/exec.c . Also wrap read_lock in
 /proc/fs/array.c for reading task->start_boottime to remove the race.

Signed-off-by: Navin P <navinp0304 at gmail.com>
---
 fs/exec.c       | 2 ++
 fs/proc/array.c | 8 ++++++--
 kernel/fork.c   | 6 +++++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 18594f11c31f..6e29698bdd90 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1109,7 +1109,9 @@ static int de_thread(struct task_struct *tsk)
  * also take its birthdate (always earlier than our own).
  */
  tsk->start_time = leader->start_time;
+ write_lock_irq(&tasklist_lock);
  tsk->start_boottime = leader->start_boottime;
+ write_unlock_irq(&tasklist_lock);

  BUG_ON(!same_thread_group(leader, tsk));
  /*
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 74389aaefa9c..dae7e973b9de 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -469,7 +469,7 @@ static int do_task_stat(struct seq_file *m, struct
pid_namespace *ns,
  int num_threads = 0;
  int permitted;
  struct mm_struct *mm;
- unsigned long long start_time;
+ unsigned long long start_time,curr_boottime;
  unsigned long cmin_flt = 0, cmaj_flt = 0;
  unsigned long  min_flt = 0,  maj_flt = 0;
  u64 cutime, cstime, utime, stime;
@@ -563,8 +563,12 @@ static int do_task_stat(struct seq_file *m,
struct pid_namespace *ns,
  nice = task_nice(task);

  /* apply timens offset for boottime and convert nsec -> ticks */
+
+        read_lock(&tasklist_lock);
  start_time =
  nsec_to_clock_t(timens_add_boottime_ns(task->start_boottime));
+ curr_boottime = task->start_boottime;
+ read_unlock(&tasklist_lock);

  seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns));
  seq_puts(m, " (");
@@ -645,7 +649,7 @@ static int do_task_stat(struct seq_file *m, struct
pid_namespace *ns,
  else
  seq_puts(m, " 0");

- seq_put_decimal_ull(m, " ", task->start_boottime);
+ seq_put_decimal_ull(m, " ", curr_boottime);
  seq_putc(m, '\n');
  if (mm)
  mmput(mm);
diff --git a/kernel/fork.c b/kernel/fork.c
index 426cd0c51f9e..41728b9bb76d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2219,11 +2219,15 @@ static __latent_entropy struct task_struct
*copy_process(
  * communication until we take the tasklist-lock. In particular, we do
  * not want user-space to be able to predict the process start-time by
  * stalling fork(2) after we recorded the start_time but before it is
- * visible to the system.
+ * visible to the system. Quickly take write lock and fill in details
+         * preventing race.
  */

  p->start_time = ktime_get_ns();
+        write_lock_irq(&tasklist_lock);
  p->start_boottime = ktime_get_boottime_ns();
+ write_unlock_irq(&tasklist_lock);
+

  /*
  * Make it visible to the rest of the system, but dont wake it up yet.
-- 
2.25.1



More information about the Kernelnewbies mailing list