pid start time and new display field in proc pid stat [ race in task->start_boottime ,start_time]
Navin P
navinp0304 at gmail.com
Thu Apr 1 10:50:04 EDT 2021
On Fri, Mar 26, 2021 at 8:31 AM Navin P <navinp0304 at gmail.com> wrote:
>
> On Thu, Mar 25, 2021 at 8:51 PM <jim.cromie at gmail.com> wrote:
> >
> >
> >
> > On Thu, Mar 25, 2021 at 6:53 AM Navin P <navinp0304 at gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> As of 5.11 kernel (pid,pid_start_time) is not unique /monotonic even
> >> though the underlying counters are .
> >> I chose start_boottime because i wanted the counter to increase
> >> during suspend as well.
> >>
> My patch doesn't create a new field, it uses the existing field and
> prints it in /proc/[pid]/stat. Stat files are created
> on first lookup. These are used by ps or top.
>
> + seq_put_decimal_ull(m, " ", task->start_boottime);
After printing the task->start_boottime as 53rd field, i found
multiple process containing same value due to a race. The race exists
for p->start_time as well.
Working on the above i found a race in the kernel/fork.c at
p->start_boottime = ktime_get_boottime_ns(); i logged the output and
compared them to find identical values for many process.
I made a patch and this fixes the issue but i'm not happy with the
change because it locks the tasklist_lock which is a big lock.
What are the other ways that can be used to fix this problem ?
I was thinking of changing start_boottime as struct start_boottime {
u64 boottime, rw_lock };
Now all my access to boottime go through the rw_lock . Another rwlock
has to be created for start_time as well.
>From a048ac81b468ec94cb10ca41416cfe9641be3c5b Mon Sep 17 00:00:00 2001
From: Navin P <navinp0304 at gmail.com>
Date: Thu, 1 Apr 2021 20:04:24 +0530
Subject: [PATCH] Wrap task->start_boottime in write_lock_irq during creation
and filling leader->task_boottime in fs/exec.c . Also wrap read_lock in
/proc/fs/array.c for reading task->start_boottime to remove the race.
Signed-off-by: Navin P <navinp0304 at gmail.com>
---
fs/exec.c | 2 ++
fs/proc/array.c | 8 ++++++--
kernel/fork.c | 6 +++++-
3 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 18594f11c31f..6e29698bdd90 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1109,7 +1109,9 @@ static int de_thread(struct task_struct *tsk)
* also take its birthdate (always earlier than our own).
*/
tsk->start_time = leader->start_time;
+ write_lock_irq(&tasklist_lock);
tsk->start_boottime = leader->start_boottime;
+ write_unlock_irq(&tasklist_lock);
BUG_ON(!same_thread_group(leader, tsk));
/*
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 74389aaefa9c..dae7e973b9de 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -469,7 +469,7 @@ static int do_task_stat(struct seq_file *m, struct
pid_namespace *ns,
int num_threads = 0;
int permitted;
struct mm_struct *mm;
- unsigned long long start_time;
+ unsigned long long start_time,curr_boottime;
unsigned long cmin_flt = 0, cmaj_flt = 0;
unsigned long min_flt = 0, maj_flt = 0;
u64 cutime, cstime, utime, stime;
@@ -563,8 +563,12 @@ static int do_task_stat(struct seq_file *m,
struct pid_namespace *ns,
nice = task_nice(task);
/* apply timens offset for boottime and convert nsec -> ticks */
+
+ read_lock(&tasklist_lock);
start_time =
nsec_to_clock_t(timens_add_boottime_ns(task->start_boottime));
+ curr_boottime = task->start_boottime;
+ read_unlock(&tasklist_lock);
seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns));
seq_puts(m, " (");
@@ -645,7 +649,7 @@ static int do_task_stat(struct seq_file *m, struct
pid_namespace *ns,
else
seq_puts(m, " 0");
- seq_put_decimal_ull(m, " ", task->start_boottime);
+ seq_put_decimal_ull(m, " ", curr_boottime);
seq_putc(m, '\n');
if (mm)
mmput(mm);
diff --git a/kernel/fork.c b/kernel/fork.c
index 426cd0c51f9e..41728b9bb76d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2219,11 +2219,15 @@ static __latent_entropy struct task_struct
*copy_process(
* communication until we take the tasklist-lock. In particular, we do
* not want user-space to be able to predict the process start-time by
* stalling fork(2) after we recorded the start_time but before it is
- * visible to the system.
+ * visible to the system. Quickly take write lock and fill in details
+ * preventing race.
*/
p->start_time = ktime_get_ns();
+ write_lock_irq(&tasklist_lock);
p->start_boottime = ktime_get_boottime_ns();
+ write_unlock_irq(&tasklist_lock);
+
/*
* Make it visible to the rest of the system, but dont wake it up yet.
--
2.25.1
More information about the Kernelnewbies
mailing list