[PATCH 5/8] dyndbg: WIP replace __dyndbg_callsite section with a zs-pool copy.

Jim Cromie jim.cromie at gmail.com
Wed Aug 5 14:30:20 EDT 2020


HEAD~1 split struct _ddebugs into heads & bodies, linked accross 2 ELF
sections.  Lets now store copies of the bodies into a zs_pool, and
relink head to the new body.  This should allow recycling the section
soon.

The strategy is to let a compression algo handle the repetition, and
map individual records in when needed.  If that works, we only need
one record at a time, so the 3:1 zram compression applys to 99.9% of
callsites, ie to 98120 bytes below:

 dyndbg: 259 modules, 2453 entries and 10360 bytes in ddebug tables, 98120 bytes in __dyndbg section, 98120 bytes in __dyndbg_callsites section
 dyndbg: 2453 entries. repeated entries: 2193 module 2058 file 981 func

Since dynamic_debug is nice to have but rarely used, paying more for
active logging to save in-kernel memory might be a worthwhile tradeoff.

This is how:

ddebug_zpool_init(), at late_initcall time, creates the zpool.

Existing ddebug_add_module() now also calls ddebug_zpool_add(), which
copies the module's _ddebug records into the zpool, once it is ready.
For builtins, added early, the pool is not ready yet.

So ddebug_zpool_init() also calls ddebug_zpool_add() for all those
builtin modules already _add()ed.

ddebug_zpool_add() does, foreach _ddebug:
 - zs_mallocs and saves to .zhandle,
 - zs_maps and saves .site to it
 - zs_unmaps it, triggering write to zram
 - .site=0, detaching __dyndbg_callsites[item]
   which later triggers zs_map

So we leave late-init with a full zram copy,
and __dyndbg_callsites ready to reclaim.

run-time access for 3 existing users is entirely via helpers;
ddebug_callsite_get(), ddebug_callsite_put().  These 3 uses are all
with ddebug_lock held.

_get() returns .site 1st, or maps zrec to it and returns it.
_put() always unmaps, minimizing online ram (see 1 below)

ddebug_change() also gets s/continue/goto skipsite/g to cleanly unmap
the record at the bottom of the loop.

Next steps:

1. dont always unmap in _put(), leave enabled pr_debugs mapped.

This would make the set of zs_mapped recs work like a cache, reducing
zs_mappings to a trickle, and only for <>control.

2. adapt struct _ddebug to unionize .site and .zhandle

We currently use !!.site and !!.zhandle tests to keep state, we have
bits available in flags to replace these tests, making union possible.

A crazy alt is to create a "restricted" zpool, which limits pool size
to 2**N handles, and returned handles to an (N+ck)-bit number, which
would fit beside our 8-bit flags, inside an 8-byte pointer.  This
would be cool, but isnt necessary, since union looks practical.

3. move key from struct _ddebug into _ddebug_callsite

key doesnt exactly fit semantically (and its not RO), but its use is
super rare, and is never needed without also needing all the other
callsite data.  Getting it out of head saves on-line ram.  But it
broke in HEAD~1 when I tried it, with asm goto errors.

4. linker question

DECLARE_DYNAMIC_DEBUG_METADATA macro fills 2 sections simultaneously.
If we can rely on the linker to keep those records in identical order,
we can 'index' both _dyndbg & _callsites (& keys) with [N-opaque] and
drop the .site pointer.

5. improve compression ?

Im seeing 3:1 pages_per_zspage:

 # cut -c1-12,40-77,85-90 $SKD/zsmalloc/dyndbg_callsites/classes | head -n3
 class  sizebj_allocated   obj_used pages_used pagzspage
     0    32           0          0          0         1
     1    48        2816       2605         33         3

Since the __dyndbg_callsite section is well sorted, a simple
run-length encoding on 3 RO members (modname, file, function) could
reclaim most of the (90%, 84%, 45%) repeated values.  That said, a
standard compressor should be able to compete, and there are several
to try.

Lastly, this whole zram block is RO after late-init (maybe, if we
split into 2 zpools, for builtins and loadable modules).  So
compression can be biased towards costly but excellent compression
(esp for RO zpool) and good small block decompression for random-ish
access.

6. we could add a ram_reclaim(), to walk the dt-list, check each
_ddebug, and unmap the mapped ones.  This presumes 1. is done, and
there are sometimes memory pressures to respond to.

Potential benefits:
- convert in-kernel mem to zram/etc
- zram compression
- eventually swap it out entirely
- map in the enabled callsites only

Signed-off-by: Jim Cromie <jim.cromie at gmail.com>
---
 include/linux/dynamic_debug.h |   1 +
 lib/dynamic_debug.c           | 140 +++++++++++++++++++++++++++++++---
 2 files changed, 129 insertions(+), 12 deletions(-)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index 1bf52952df69..693911920a32 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -29,6 +29,7 @@ struct _ddebug_callsite {
 
 struct _ddebug {
 	struct _ddebug_callsite *site;
+	long unsigned int zhandle;
 
 	/*
 	 * The flags field controls the behaviour at the callsite.
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 9161f345e3c9..049299027fb3 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -36,6 +36,7 @@
 #include <linux/sched.h>
 #include <linux/device.h>
 #include <linux/netdevice.h>
+#include <linux/zsmalloc.h>
 
 #include <rdma/ib_verbs.h>
 
@@ -72,6 +73,8 @@ static LIST_HEAD(ddebug_tables);
 static int verbose;
 module_param(verbose, int, 0644);
 
+static struct zs_pool *dd_callsite_zpool;
+
 /* Return the path relative to source root */
 static inline const char *trim_prefix(const char *path)
 {
@@ -118,6 +121,8 @@ do {								\
 
 #define vpr_info(fmt, ...)	vnpr_info(1, fmt, ##__VA_ARGS__)
 #define v2pr_info(fmt, ...)	vnpr_info(2, fmt, ##__VA_ARGS__)
+#define v3pr_info(fmt, ...)	vnpr_info(3, fmt, ##__VA_ARGS__)
+#define v4pr_info(fmt, ...)	vnpr_info(4, fmt, ##__VA_ARGS__)
 
 static void vpr_info_dq(const struct ddebug_query *query, const char *msg)
 {
@@ -139,6 +144,41 @@ static void vpr_info_dq(const struct ddebug_query *query, const char *msg)
 		 query->first_lineno, query->last_lineno);
 }
 
+/*
+ * Lifecycle of struct _ddebug's .site and .zhandle:
+ *
+ * At early-init, .site points into __dyndbg_callsite[], .zhandle=0
+ *
+ * At late-init, zpool_add(callsite)s copies into zram, saves ref in
+ * .zhandle, and forgets .site, allowing __dyndbg_callsite[] to be
+ * reclaimed.  After this, we restore .site from .zhandle when needed.
+ * We default to minimal mapping, load on demand.
+ *
+ * _get() returns .site preferentially, and updates it when needed.
+ * this works for early-prints and minimal mapped records strategy.
+ */
+static struct _ddebug_callsite* ddebug_callsite_get(struct _ddebug *dp)
+{
+	if (dp->site)
+		return dp->site;
+
+	dp->site = (struct _ddebug_callsite*)
+		zs_map_object(dd_callsite_zpool, dp->zhandle, ZS_MM_RO);
+
+	return dp->site;
+}
+
+static void ddebug_callsite_put(struct _ddebug *dp)
+{
+	if (!dp->site || !dp->zhandle)
+		/* no site to unmap, or no means to restore */
+		return;
+
+	/* always unmap for now. if !pr-debug was too hard */
+	zs_unmap_object(dd_callsite_zpool, dp->zhandle);
+	dp->site = NULL;
+}
+
 /*
  * Search the tables for _ddebug's which match the given `query' and
  * apply the `flags' and `mask' to them.  Returns number of matching
@@ -165,7 +205,8 @@ static int ddebug_change(const struct ddebug_query *query,
 
 		for (i = 0; i < dt->num_ddebugs; i++) {
 			struct _ddebug *dp = &dt->ddebugs[i];
-			struct _ddebug_callsite *dc = dp->site;
+			struct _ddebug_callsite *dc;
+			dc = ddebug_callsite_get(dp);
 
 			/* match against the source filename */
 			if (query->filename &&
@@ -174,12 +215,12 @@ static int ddebug_change(const struct ddebug_query *query,
 					   kbasename(dc->filename)) &&
 			    !match_wildcard(query->filename,
 					   trim_prefix(dc->filename)))
-				continue;
+				goto skipsite;
 
 			/* match against the function */
 			if (query->function &&
 			    !match_wildcard(query->function, dc->function))
-				continue;
+				goto skipsite;
 
 			/* match against the format */
 			if (query->format) {
@@ -188,24 +229,24 @@ static int ddebug_change(const struct ddebug_query *query,
 					/* anchored search. match must be at beginning */
 					p = strstr(dc->format, query->format+1);
 					if (p != dc->format)
-						continue;
+						goto skipsite;
 				} else if (!strstr(dc->format, query->format))
-					continue;
+					goto skipsite;
 			}
 
 			/* match against the line number range */
 			if (query->first_lineno &&
 			    dc->lineno < query->first_lineno)
-				continue;
+				goto skipsite;
 			if (query->last_lineno &&
 			    dc->lineno > query->last_lineno)
-				continue;
+				goto skipsite;
 
 			nfound++;
 
 			newflags = (dp->flags & modifiers->mask) | modifiers->flags;
 			if (newflags == dp->flags)
-				continue;
+				goto skipsite;
 #ifdef CONFIG_JUMP_LABEL
 			if (dp->flags & _DPRINTK_FLAGS_PRINT) {
 				if (!(modifiers->flags & _DPRINTK_FLAGS_PRINT))
@@ -218,6 +259,9 @@ static int ddebug_change(const struct ddebug_query *query,
 				 trim_prefix(dc->filename), dc->lineno,
 				 dt->mod_name, dc->function,
 				 ddebug_describe_flags(dp->flags, &fbuf));
+
+		skipsite:
+			ddebug_callsite_put(dp);
 		}
 	}
 	mutex_unlock(&ddebug_lock);
@@ -569,14 +613,16 @@ static int remaining(int wrote)
 	return 0;
 }
 
-static char *dynamic_emit_prefix(const struct _ddebug *dp, char *buf)
+static char *dynamic_emit_prefix(struct _ddebug *dp, char *buf)
 {
 	int pos_after_tid;
 	int pos = 0;
-	const struct _ddebug_callsite *desc = dp->site;
+	const struct _ddebug_callsite *desc;
 
 	*buf = '\0';
 
+	desc = ddebug_callsite_get(dp);
+
 	if (dp->flags & _DPRINTK_FLAGS_INCL_TID) {
 		if (in_interrupt())
 			pos += snprintf(buf + pos, remaining(pos), "<intr> ");
@@ -599,6 +645,8 @@ static char *dynamic_emit_prefix(const struct _ddebug *dp, char *buf)
 	if (pos >= PREFIX_SIZE)
 		buf[PREFIX_SIZE - 1] = '\0';
 
+	ddebug_callsite_put(dp);
+
 	return buf;
 }
 
@@ -863,7 +911,7 @@ static int ddebug_proc_show(struct seq_file *m, void *p)
 {
 	struct ddebug_iter *iter = m->private;
 	struct _ddebug *dp = p;
-	struct _ddebug_callsite *dc = dp->site;
+	struct _ddebug_callsite *dc;
 	struct flagsbuf flags;
 
 	if (p == SEQ_START_TOKEN) {
@@ -872,6 +920,8 @@ static int ddebug_proc_show(struct seq_file *m, void *p)
 		return 0;
 	}
 
+	dc = ddebug_callsite_get(dp);
+
 	seq_printf(m, "%s:%u [%s]%s =%s \"",
 		   trim_prefix(dc->filename), dc->lineno,
 		   iter->table->mod_name, dc->function,
@@ -879,6 +929,8 @@ static int ddebug_proc_show(struct seq_file *m, void *p)
 	seq_escape(m, dc->format, "\t\r\n\"");
 	seq_puts(m, "\"\n");
 
+	ddebug_callsite_put(dp);
+
 	return 0;
 }
 
@@ -922,6 +974,39 @@ static const struct proc_ops proc_fops = {
 	.proc_write = ddebug_proc_write
 };
 
+/*
+ * copy struct _ddebug records into the zpool, and remember zhandle in
+ * the struct.  This is close to what we'll want to unionize the
+ * handle and the struct
+ */
+static void ddebug_zpool_add(struct _ddebug *dp)
+{
+	unsigned long handle;
+	struct _ddebug_callsite *cursor;
+
+	handle = zs_malloc(dd_callsite_zpool, sizeof(struct _ddebug_callsite),
+			   GFP_KERNEL);
+	if (!handle) {
+		pr_err("pool malloc failed on %s\n", dp->site->function);
+		return;
+	}
+	BUG_ON(handle % 2); /* zs-malloc handles are even ! */
+	dp->zhandle = handle;
+
+	cursor = (struct _ddebug_callsite *)
+		zs_map_object(dd_callsite_zpool, handle, ZS_MM_WO);
+
+	if (!cursor) {
+		pr_err("zs-map failed\n");
+		return;
+	}
+	memcpy(cursor, dp->site, sizeof(struct _ddebug_callsite));
+
+	/* forget __dyndbg_callsites[] so we can reclaim it */
+	dp->site = NULL;
+	zs_unmap_object(dd_callsite_zpool, handle);
+}
+
 /*
  * Allocate a new ddebug_table for the given module
  * and add it to the global list.
@@ -947,10 +1032,15 @@ int ddebug_add_module(struct _ddebug *tab, unsigned int n,
 	dt->ddebugs = tab;
 
 	mutex_lock(&ddebug_lock);
+	if (dd_callsite_zpool)
+		/* zpool is ready for filling late */
+		for (; n; n--, tab++)
+			ddebug_zpool_add(tab);
+
 	list_add(&dt->link, &ddebug_tables);
 	mutex_unlock(&ddebug_lock);
 
-	v2pr_info("%3u debug prints in module %s\n", n, dt->mod_name);
+	v2pr_info("%3u debug prints in module %s\n", dt->num_ddebugs, dt->mod_name);
 	return 0;
 }
 
@@ -1060,6 +1150,31 @@ static int __init dynamic_debug_init_control(void)
 	return 0;
 }
 
+/*
+ * initialize the zpool, and fill it with copies of struct _ddebug
+ * records in the __verbose/__dyndbg section.
+ */
+static void __init ddebug_zpool_init(void)
+{
+	struct _ddebug *iter;
+
+	/* tbd- no corresponding destroy */
+	dd_callsite_zpool = zs_create_pool("dyndbg_callsites");
+	if (!dd_callsite_zpool) {
+		pr_err("create pool failed\n");
+		return;
+	}
+
+	/* add-module normally does this, but not in time for builtins */
+	for (iter = __start___dyndbg; iter < __stop___dyndbg; iter++)
+		ddebug_zpool_add(iter);
+
+	v2pr_info("total pages: %lu compaction: %lu\n",
+		  zs_get_total_pages(dd_callsite_zpool),
+		  zs_compact(dd_callsite_zpool));
+}
+late_initcall(ddebug_zpool_init);
+
 static int __init dynamic_debug_init(void)
 {
 	struct _ddebug *iter, *iter_start, *prev = 0;
@@ -1149,3 +1264,4 @@ early_initcall(dynamic_debug_init);
 
 /* Debugfs setup must be done later */
 fs_initcall(dynamic_debug_init_control);
+
-- 
2.26.2




More information about the Kernelnewbies mailing list