[PATCH 1/3] dyndbg: WIP replace __dyndbg section with a zs-pool copy.

Jim Cromie jim.cromie at gmail.com
Sat Jul 25 12:36:00 EDT 2020


Consider the bloat in the __dyndbg linker section:
 266 modules, 2542 entries and 10640 bytes in ddebug tables,
      142352 bytes in __dyndbg section
 2275 module repeats, 1045 func-repeats 2139 file-repeats

The linker populates the __dyndbg section for us, limiting
our ability to eliminate those repetitions structurally.

So instead, lets copy the section into zram (maybe zswap later?), let
a compressor handle the repetition, and map individual records in
on-demand.  Once the 3 users of struct _ddebug's members are converted
to fetch the record from the zpool, we can retire the entire __dyndbg
section.

Since dynamic_debug is nice to have but rarely used, paying more for
active logging to save in-kernel memory might be a worthwhile tradeoff.

This is how:

- At late_initcall time, ddebug_zpool_init() creates the zpool.
  This will likely play well with __initdata auto reclaim
  I tried postcore_initcall and others, got NULL ptrs, maybe revisit.

- ddebug_add_module() now also calls ddebug_zpool_add() to copy the
  module's _ddebug records into the zpool, once its available.

- ddebug_zpool_init() inits the zpool, and also calls
  ddebug_zpool_add() on behalf of all the builtin modules which
  couldnt be added to the pool because itit wasnt ready.

- add zhandle into struct _ddebug.
  via 2 anonymous structs.
  white-space abusing, todo.
  outer struct needs to be a union to save any ram.

The 3 struct _ddebug users; ddebug_proc_show(), dynamic_emit_prefix(),
and ddebug_change(), are updated to use 2 new helpers:

- ddebug_zrec_get(handle) - map the record in
- ddebug_zrec_put(handle) - unmap

ddebug_change() also gets s/continue/goto skipsite/g to cleanly unmap
the record.  Or I could abuse the forloop, this is better.

ATM, the code works, at least enough to try some more design.  It does
not get any actual compression, and is just bigger right now.

Next steps:

s1- convert struct to union. this is close, has broken so far..

This is complicated by the need for the union to function properly for
both early and post-late demands.  So we also need a flag-bit of a
sort to decide what kind of record is stored so we can use it (more
cleverly than current zhandle checks).  I briefly tried +1,-1 on
zhandle to make it %2-odd, which things then, time to re-try.

s2- split struct _ddebug, separate out flags and keys.

We must split out at least flags, because non-JUMP_LABEL builds must
check the 'p' flag for every callsite encountered (no NOOP-patchsite),
so flags cannot be that expensive to access, ie not in zram.

It would probably look a bit like:

 DYNAMIC_DEBUG_METADATA( .. )
   static struct _dd_callsite  __section(__dyndbg_callsite) name##_calls ..
   static struct _dd_flags     __section(__dyndbg_flags)    name##_flags ..
   static struct _dd_keys      __section(__dyndbg_keys)     name##_keys  ..
   // composing struct _ddebug
   static struct _dd           __section(__dyndbg)       name
   	  .calls = &name##_calls
	  .keys =  &name##_keys
      	  .flags = &name##_flags

Once we have this, the __dyndbg_callsite section can go to zram, and
will perhaps be more readily compressible. (we get none atm).

JUMP_LABEL builds will use keys only when changing 'p' flag and
hot-patching that callsite; since this is rare, the keys could go into
zram.  non-JUMP builds won't use keys at all.

s3- linker question

The DYNAMIC_DEBUG_METADATA macro outlined above would fill 4 sections
simultaneously.  If we can rely on the linker to keep those sections
in the same order, we can "invert" the struct _ddebug vector into
separate vectors of members, each indexed by [N], and that "composing
struct _ddebug" above can be dropped, fixing the space-cost, and
leaving only the indexing costs.

OTOH, I saw a CONFIG item recently that helped to prevent
cross-section linkages, so theres some intereference possible.

s4- fix the lack of compression

Since the __dyndbg section is well sorted, a simple run-length
encoding on 3 RO members (modname, file, function) could reclaim most
of the (90%, 84%, 45%) repeated values.  But this sounds like
premature optimization, at least until standard compression is shown
incapable.  Also, the s2- split might fix the problem, by only
applying compression to the callsite data.

OTOH, pushing the current zpool (50 pages) out to swap makes
non-compression arguably moot.

Potential benefits:
- compression
- convert in-kernel mem to zram/etc
- eventually swap it out entirely
- map in the enabled callsites only

TLDR

Note also that the format pointer is kept 2x in dynamic-debug; once
inside the struct _ddebug, and again as the fmt parameter in the
*_dbg() functions and macro stack.  I dont see how 'fixing' this is
worthwhile.

Signed-off-by: Jim Cromie <jim.cromie at gmail.com>
---
 include/linux/dynamic_debug.h |   4 ++
 lib/dynamic_debug.c           | 131 +++++++++++++++++++++++++++++++---
 2 files changed, 126 insertions(+), 9 deletions(-)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index aa9ff9e1c0b3..d23f283fff70 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -12,6 +12,9 @@
  * the special section is treated as an array of these.
  */
 struct _ddebug {
+	struct {
+		long unsigned int zhandle;
+		struct {
 	/*
 	 * These fields are used to drive the user interface
 	 * for selecting and displaying debug callsites.
@@ -44,6 +47,7 @@ struct _ddebug {
 		struct static_key_false dd_key_false;
 	} key;
 #endif
+		};}; // struct union
 } __attribute__((aligned(8)));
 
 
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 1d012e597cc3..bb23d5d56116 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -36,6 +36,7 @@
 #include <linux/sched.h>
 #include <linux/device.h>
 #include <linux/netdevice.h>
+#include <linux/zsmalloc.h>
 
 #include <rdma/ib_verbs.h>
 
@@ -72,6 +73,8 @@ static LIST_HEAD(ddebug_tables);
 static int verbose;
 module_param(verbose, int, 0644);
 
+static struct zs_pool *dd_callsite_zpool;
+
 /* Return the path relative to source root */
 static inline const char *trim_prefix(const char *path)
 {
@@ -118,6 +121,7 @@ do {								\
 
 #define vpr_info(fmt, ...)	vnpr_info(1, fmt, ##__VA_ARGS__)
 #define v2pr_info(fmt, ...)	vnpr_info(2, fmt, ##__VA_ARGS__)
+#define v3pr_info(fmt, ...)	vnpr_info(3, fmt, ##__VA_ARGS__)
 
 static void vpr_info_dq(const struct ddebug_query *query, const char *msg)
 {
@@ -139,6 +143,17 @@ static void vpr_info_dq(const struct ddebug_query *query, const char *msg)
 		 query->first_lineno, query->last_lineno);
 }
 
+static inline struct _ddebug* ddebug_zrec_get(long unsigned int handle)
+{
+	return (struct _ddebug*)
+		zs_map_object(dd_callsite_zpool, handle, ZS_MM_RW);
+}
+
+static inline void ddebug_zrec_put(long unsigned int handle)
+{
+	zs_unmap_object(dd_callsite_zpool, handle);
+}
+
 /*
  * Search the tables for _ddebug's which match the given `query' and
  * apply the `flags' and `mask' to them.  Returns number of matching
@@ -164,7 +179,8 @@ static int ddebug_change(const struct ddebug_query *query,
 			continue;
 
 		for (i = 0; i < dt->num_ddebugs; i++) {
-			struct _ddebug *dp = &dt->ddebugs[i];
+			struct _ddebug *dp;
+			dp = ddebug_zrec_get(dt->ddebugs[i].zhandle);
 
 			/* match against the source filename */
 			if (query->filename &&
@@ -173,12 +189,12 @@ static int ddebug_change(const struct ddebug_query *query,
 					   kbasename(dp->filename)) &&
 			    !match_wildcard(query->filename,
 					   trim_prefix(dp->filename)))
-				continue;
+				goto skipsite;
 
 			/* match against the function */
 			if (query->function &&
 			    !match_wildcard(query->function, dp->function))
-				continue;
+				goto skipsite;
 
 			/* match against the format */
 			if (query->format) {
@@ -187,24 +203,24 @@ static int ddebug_change(const struct ddebug_query *query,
 					/* anchored search. match must be at beginning */
 					p = strstr(dp->format, query->format+1);
 					if (p != dp->format)
-						continue;
+						goto skipsite;
 				} else if (!strstr(dp->format, query->format))
-					continue;
+					goto skipsite;
 			}
 
 			/* match against the line number range */
 			if (query->first_lineno &&
 			    dp->lineno < query->first_lineno)
-				continue;
+				goto skipsite;
 			if (query->last_lineno &&
 			    dp->lineno > query->last_lineno)
-				continue;
+				goto skipsite;
 
 			nfound++;
 
 			newflags = (dp->flags & modifiers->mask) | modifiers->flags;
 			if (newflags == dp->flags)
-				continue;
+				goto skipsite;
 #ifdef CONFIG_JUMP_LABEL
 			if (dp->flags & _DPRINTK_FLAGS_PRINT) {
 				if (!(modifiers->flags & _DPRINTK_FLAGS_PRINT))
@@ -217,6 +233,9 @@ static int ddebug_change(const struct ddebug_query *query,
 				 trim_prefix(dp->filename), dp->lineno,
 				 dt->mod_name, dp->function,
 				 ddebug_describe_flags(dp->flags, &fbuf));
+
+		skipsite:
+			ddebug_zrec_put(dt->ddebugs[i].zhandle);
 		}
 	}
 	mutex_unlock(&ddebug_lock);
@@ -568,13 +587,23 @@ static int remaining(int wrote)
 	return 0;
 }
 
-static char *dynamic_emit_prefix(const struct _ddebug *desc, char *buf)
+static char *dynamic_emit_prefix(const struct _ddebug *dp, char *buf)
 {
 	int pos_after_tid;
 	int pos = 0;
+	struct _ddebug *desc;
 
 	*buf = '\0';
 
+	if (!dp->zhandle) {
+		pr_err("Nul zhandle: %s.%s\n", dp->modname, dp->function);
+		desc = dp;
+	} else {
+		desc = ddebug_zrec_get(dp->zhandle);
+		v3pr_info("good zhandle: %s.%s\n",
+			  desc->modname, desc->function);
+	}
+
 	if (desc->flags & _DPRINTK_FLAGS_INCL_TID) {
 		if (in_interrupt())
 			pos += snprintf(buf + pos, remaining(pos), "<intr> ");
@@ -597,6 +626,13 @@ static char *dynamic_emit_prefix(const struct _ddebug *desc, char *buf)
 	if (pos >= PREFIX_SIZE)
 		buf[PREFIX_SIZE - 1] = '\0';
 
+	if (!dp->zhandle) {
+		pr_err("Nul zhandle\n");
+	} else {
+		v3pr_info("got zhandle\n");
+		ddebug_zrec_put(dp->zhandle);
+	}
+
 	return buf;
 }
 
@@ -862,6 +898,7 @@ static int ddebug_proc_show(struct seq_file *m, void *p)
 	struct ddebug_iter *iter = m->private;
 	struct _ddebug *dp = p;
 	struct flagsbuf flags;
+	long unsigned int handle;
 
 	if (p == SEQ_START_TOKEN) {
 		seq_puts(m,
@@ -869,6 +906,17 @@ static int ddebug_proc_show(struct seq_file *m, void *p)
 		return 0;
 	}
 
+	BUG_ON(!dp->zhandle);
+	handle = dp->zhandle;
+
+	dp = ddebug_zrec_get(handle);
+
+	if (!dp) {
+		pr_err("zs-map failed on %lu\n", handle);
+		return 0; // bail
+	}
+	v3pr_info("mapped h:%lu %s\n", handle, dp->function);
+
 	seq_printf(m, "%s:%u [%s]%s =%s \"",
 		   trim_prefix(dp->filename), dp->lineno,
 		   iter->table->mod_name, dp->function,
@@ -876,6 +924,8 @@ static int ddebug_proc_show(struct seq_file *m, void *p)
 	seq_escape(m, dp->format, "\t\r\n\"");
 	seq_puts(m, "\"\n");
 
+	ddebug_zrec_put(handle);
+
 	return 0;
 }
 
@@ -919,6 +969,38 @@ static const struct proc_ops proc_fops = {
 	.proc_write = ddebug_proc_write
 };
 
+/*
+ * copy struct _ddebug records into the zpool, and remember zhandle in
+ * the struct.  This is close to what we'll want to unionize the
+ * handle and the struct
+ */
+static void ddebug_zpool_add(struct _ddebug *dp)
+{
+	unsigned long handle;
+	struct _ddebug *cursor;
+
+	handle = zs_malloc(dd_callsite_zpool, sizeof(struct _ddebug),
+			   GFP_KERNEL);
+	if (!handle) {
+		pr_err("pool malloc failed on %s\n", dp->function);
+		return;
+	}
+	dp->zhandle = handle;
+
+	cursor = (struct _ddebug *)
+		zs_map_object(dd_callsite_zpool, handle, ZS_MM_WO);
+
+	if (!cursor) {
+		pr_err("zs-map failed\n");
+		return;
+	}
+	memcpy(cursor, dp, sizeof(struct _ddebug));
+
+	v3pr_info("h:%lu %s\n", handle, cursor->function);
+
+	zs_unmap_object(dd_callsite_zpool, handle);
+}
+
 /*
  * Allocate a new ddebug_table for the given module
  * and add it to the global list.
@@ -943,6 +1025,11 @@ int ddebug_add_module(struct _ddebug *tab, unsigned int n,
 	dt->num_ddebugs = n;
 	dt->ddebugs = tab;
 
+	if (dd_callsite_zpool)
+		/* zpool is ready for filling late */
+		for (; n; n--, tab++)
+			ddebug_zpool_add(tab);
+
 	mutex_lock(&ddebug_lock);
 	list_add(&dt->link, &ddebug_tables);
 	mutex_unlock(&ddebug_lock);
@@ -1057,6 +1144,31 @@ static int __init dynamic_debug_init_control(void)
 	return 0;
 }
 
+/*
+ * initialize the zpool, and fill it with copies of struct _ddebug
+ * records in the __verbose/__dyndbg section.
+ */
+static void __init ddebug_zpool_init(void)
+{
+	struct _ddebug *iter;
+
+	/* tbd- no corresponding destroy */
+	dd_callsite_zpool = zs_create_pool("dyndbg_callsites");
+	if (!dd_callsite_zpool) {
+		pr_err("create pool failed\n");
+		return;
+	}
+
+	/* add-module normally does this, but not in time for builtins */
+	for (iter = __start___dyndbg; iter < __stop___dyndbg; iter++)
+		ddebug_zpool_add(iter);
+
+	v2pr_info("total pages: %lu compaction: %lu\n",
+		  zs_get_total_pages(dd_callsite_zpool),
+		  zs_compact(dd_callsite_zpool));
+}
+late_initcall(ddebug_zpool_init);
+
 static int __init dynamic_debug_init(void)
 {
 	struct _ddebug *iter, *iter_start;
@@ -1132,3 +1244,4 @@ early_initcall(dynamic_debug_init);
 
 /* Debugfs setup must be done later */
 fs_initcall(dynamic_debug_init_control);
+
-- 
2.26.2




More information about the Kernelnewbies mailing list