Kernel bug tracker

Thomas Schmitt scdbackup at gmx.net
Sun Sep 5 07:57:01 EDT 2021


Hi,

second patch proposal for isofs because Adverg Ebashinskii wrote:
> If you could share your patch here to understand the problem better I would
> gladly dig into it.

========================================================================
0000-cover-letter.patch

From 3d484405f0ad8d10ef490281da157bfdd7450cb6 Mon Sep 17 00:00:00 2001
From: Thomas Schmitt <scdbackup at gmx.net>
Date: Tue, 22 Sep 2020 12:35:52 +0200
Subject: [PATCH 0/1] isofs: truncate oversized Rock Ridge names to 255 bytes

Currently Rock Ridge names of length >= 254 are coarsely truncated by
discarding the whole NM entry where the overflow happened. This yields
name lengths of much less than the permissible 255 bytes.
There is no reason to see why to exclude length 254 and 255 and especially
to truncate by possibly a hundred or more bytes than necessary.

So i propose to raise the length of permissible names to 255 and to let
truncation yield exactly a string length of 255 bytes. Truncation shall
take care to invalidate UTF-8 debris at the end of the resulting string
(sorry ISO-8859).

---------------------------------------------------------------------------
Tests made:

Create an ISO 9660 image with file names of length 255, using file
/bin/true as input for both files:

  victim1=12345678901234567890123456789012345678901234567890
  victim1="$victim1"12345678901234567890123456789012345678901234567890
  victim1="$victim1"12345678901234567890123456789012345678901234567890
  victim1="$victim1"12345678901234567890123456789012345678901234567890
  victim1="$victim1"12345678901234567890123456789012345678901234567890
  victim1="$victim1"1234E
  victim2=äääääääääääääääääääääääää
  victim2="$victim2"äääääääääääääääääääääääää
  victim2="$victim2"äääääääääääääääääääääääää
  victim2="$victim2"äääääääääääääääääääääääää
  victim2="$victim2"äääääääääääääääääääääääää
  victim2="$victim2"xää
  xorriso -outdev /tmp/test_rr_name.iso \
          -blank as_needed \
          -map /bin/true /"$victim1" \
          -map /bin/true /"$victim2"

Currently the names get truncated to byte lengths 93 and 95:

  mount /tmp/test_rr_name.iso /mnt/iso
  /bin/ls /mnt/iso

yields in xterm with bash

   12345678901234567890...60.more.bytes...1234567890123
  'ääääääääääääääääääääääääääääääääääääääääääääää'$'\303'

Note the leading blank with the plain ASCII name and the shell characters
with the name that has 2-byte UTF-8 characters.
But

  /bin/ls /mnt/iso | cat

yields

  12345678901234567890...60.more.bytes...1234567890123
  ääääääääääääääääääääääääääääääääääääääääääääää�

The extra characters in xterm seem to be triggered by the presence of the
half UTF-8 'ä' at the end. Its byte 0xc3 is there, byte 0xa4 is missing.
(xterm and /bin/ls are from Debian 10.)
If i make the UTF-8 name shorter to avoid truncation or if i move the 'x'
to the start to cause truncation between complete UTF-8 'ä', the extra
characters do not show up in ls to xterm.

After my change in fs/isofs i get from /bin/ls /mnt/iso

  1234567890...230.more.bytes...12345678901234E
  ääääääääää...210.more.bytes...ääääääääääxää

Both strings have 255 bytes.

xorriso cannot be talked into writing longer Rock Ridge names. So i rather
set the new macro RR_NAME_LEN in rock.h to 33 to force truncation.
The result with /bin/ls -1 /mnt/iso is:

  123456789012345678901234567890123
  ääääääääääääääää_

Note the half 'ä' at the end being mapped to '_'.
So all characters are valid UTF-8 and no oddities of ls or xterm are to
see.

---------------------------------------------------------------------------
Remaining checkpatch.pl warning:

scripts/checkpatch.pl complains about the string
  'ääääääääääääääääääääääääääääääääääääääääääääää'$'\303'
in this text by:
  WARNING: Possible unwrapped commit description (prefer a maximum 75
  chars per line)

Maybe it should talk about "bytes" rather than "chars" or learn about
multi-byte characters in UTF-8.

I think it is beneficial if i show the whole mangled name, rather than
describing it by some ASCII-only text.

---------------------------------------------------------------------------

Have a nice day :)

Thomas

Thomas Schmitt (1):
  isofs: truncate oversized Rock Ridge names to 255 bytes

 fs/isofs/rock.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++---
 fs/isofs/rock.h |  2 ++
 2 files changed, 71 insertions(+), 4 deletions(-)

--
2.20.1

========================================================================
0001-isofs-truncate-oversized-Rock-Ridge-names-to-255-byt.patch

From 3d484405f0ad8d10ef490281da157bfdd7450cb6 Mon Sep 17 00:00:00 2001
From: Thomas Schmitt <scdbackup at gmx.net>
Date: Tue, 22 Sep 2020 12:34:50 +0200
Subject: [PATCH 1/1] isofs: truncate oversized Rock Ridge names to 255 bytes

Enlarge the limit for name bytes from 253 to 255.
Do not discard all bytes of the NM field where the overflow occurs, but
rather append them to the accumulated name before truncating it to exactly
255 bytes.
Map trailing incomplete UTF-8 bytes to '_'.

Signed-off-by: Thomas Schmitt <scdbackup at gmx.net>
---
 fs/isofs/rock.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++---
 fs/isofs/rock.h |  2 ++
 2 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/fs/isofs/rock.c b/fs/isofs/rock.c
index 94ef92fe806c..e1db8776b67e 100644
--- a/fs/isofs/rock.c
+++ b/fs/isofs/rock.c
@@ -192,6 +192,63 @@ static int rock_check_overflow(struct rock_state *rs, int sig)
 	return 0;
 }

+/*
+ * Find backward from idx the start byte of a possible UTF-8 character.
+ *   https://en.wikipedia.org/wiki/UTF-8#Description
+ * Return -1 if no incomplete UTF-8 sequence is found at the name end.
+ */
+static int rock_find_utf8_start(char *name, int idx)
+{
+	unsigned char *uname, uch;
+	int i;
+
+	uname = (unsigned char *)name;
+	/* Up to 4-byte codes */
+	for (i = 0; i < 4 && idx - i >= 0; i++) {
+		uch = uname[idx - i];
+		if ((uch & 0xe0) == 0xc0) {
+			/* UTF-8 start byte for 2-byte codes */
+			if (i >= 1)
+				return -1;		/* tail is complete */
+			else
+				return (idx - i);	/* tail not complete */
+		} else if ((uch & 0xf0) == 0xe0) {
+			/* UTF-8 start byte for 3-byte codes */
+			if (i >= 2)
+				return -1;
+			else
+				return (idx - i);
+		} else if ((uch & 0xf8) == 0xf0) {
+			/* UTF-8 start byte for 4-byte codes */
+			if (i >= 3)
+				return -1;
+			else
+				return (idx - i);
+		} else if ((uch & 0xc0) != 0x80) {
+			/* not an UTF-8 tail byte, so no UTF-8 */
+			return -1;
+		}
+	}
+	/* That would be an oversized UTF-8 code or no UTF-8 at all */
+	return -1;
+}
+
+/*
+ * Replace trailing incomplete UTF-8 sequence by '_' characters.
+ */
+static void rock_erase_incomplete_utf8(char *name, int len)
+{
+	int i;
+
+	if (len <= 0)
+		return;
+	i = rock_find_utf8_start(name, len - 1);
+	if (i < 0)
+		return;
+	for (; i < len; i++)
+		name[i] = '_';
+}
+
 /*
  * return length of name field; 0: not found, -1: to be ignored
  */
@@ -271,16 +328,24 @@ int get_rock_ridge_filename(struct iso_directory_record *de,
 				break;
 			}
 			len = rr->len - 5;
-			if (retnamlen + len >= 254) {
-				truncate = 1;
-				break;
-			}
 			p = memchr(rr->u.NM.name, '\0', len);
 			if (unlikely(p))
 				len = p - rr->u.NM.name;
+			if (retnamlen + len > RR_NAME_LEN) {
+				truncate = 1;
+				/* Take as many characters as possible */
+				len = RR_NAME_LEN - retnamlen;
+				if (len <= 0) {
+					rock_erase_incomplete_utf8(retname,
+								   retnamlen);
+					break;
+				}
+			}
 			memcpy(retname + retnamlen, rr->u.NM.name, len);
 			retnamlen += len;
 			retname[retnamlen] = '\0';
+			if (truncate == 1)
+				rock_erase_incomplete_utf8(retname, retnamlen);
 			break;
 		case SIG('R', 'E'):
 			kfree(rs.buffer);
diff --git a/fs/isofs/rock.h b/fs/isofs/rock.h
index 1558cf22ef8a..0938fc11ced4 100644
--- a/fs/isofs/rock.h
+++ b/fs/isofs/rock.h
@@ -121,3 +121,5 @@ struct rock_ridge {
 #define RR_PL 32		/* Parent link */
 #define RR_RE 64		/* Relocation directory */
 #define RR_TF 128		/* Timestamps */
+
+#define RR_NAME_LEN 255		/* Maximum length in bytes of a file name */
--
2.20.1




More information about the Kernelnewbies mailing list