Work (really slow directory access on ext4)

Nick Krause xerofoify at gmail.com
Thu Jul 31 21:47:18 EDT 2014


On Thu, Jul 31, 2014 at 7:41 PM, Henry Hallam <henry at pericynthion.org> wrote:
> Try redirecting the ls output to /dev/null or a file, thus disabling
> its color highlighting and thus removing a bunch of syscalls.  See if
> it's now the same no matter what choice of 'time'.
>
> On Thu, Jul 31, 2014 at 4:36 PM, Arlie Stephens <arlie at worldash.org> wrote:
>> Hi Nick,
>>
>> [Context - directory ls taking 4-15 seconds; directory large, with
>> long filenames, but nowhere near as huge as Valdis' mail directory.]
>>
>> I've now discovered a really bizarre pattern, and I'm inclined to stop
>> blaming the file system until some clarity develops. If I ever get it
>> to the point where I can produce a high quality bug report - with or
>> without patch - I will do so - but what I have now is anything but
>> clear and high quality.
>>
>> On Jul 30 2014, Nick Krause wrote:
>>> On Wed, Jul 30, 2014 at 3:48 PM,  <Valdis.Kletnieks at vt.edu> wrote:
>>> > On Wed, 30 Jul 2014 10:38:13 -0700, Arlie Stephens said:
>>> >
>>> >> On the good side, Vladis' observations of his mail directory have been
>>> >> a great help.
>>> >
>>> > And remember, that's on a single laptop-class hard drive, no fancy raid or
>>> > anything. (Though it *is* a hybrid, with 32G of flash cache on the front end).
>>> >
>>> > You throw some *real* hardware at it, it of course would go even faster.
>>>
>>> Just send me the logs and anything else you think may help me.
>>> Please note cc the ext4 mailing list as this will also let the other
>>> ext4 developers and maintainers known about your problem.
>>> Cheers Nick
>>
>> I'm now in a state of complete bafflement.
>>
>> It turns out we have a whole collection of misbehaving directories,
>> making this testable without waiting for caches to clear.
>>
>> I have a couple of strace's of fast ls's, and a function ftrace that
>> captured about half of a 7 second ls. (The latter is huge, and
>> probably not suitable for posting.)
>>
>> I also have a really bizarre observation, the kind that makes you
>> wonder whether you are actually dreaming. It appears that the
>> misbehaviour is strongly influenced by the choice of "time" function.
>> The problem only occurs when using the shell built-in. /usr/bin/time
>> always produces a fast response.
>>
>> Stranger still - flat out impossible, I'd have said before seeing it -
>> a "fast" ls, run with /usr/bin/time can be followed *immediately*
>> by a slow "ls", run with bash' time. It's as if the first one doesn't
>> warm the cache, which is completely absurd - except I've been able to
>> make this happen 5 times in a row, first with strace and then
>> without.
>>
>> # with /usr/bin/time the ls is fast
>> $ time -p ls bad_dir
>> ...
>> real 0.21
>> user 0.00
>> sys 0.00
>>
>>
>> # with the builtin time, right *after* the strace run, the time can be
>> # horrible.
>> $ time -p ls bad_dir
>> ...
>> real 5.60
>> user 0.00
>> sys 0.17
>>
>> # run it again, and the directory is in cache as expected.
>> $ time -p ls bad_dir
>> ...
>> real 0.11
>> user 0.00
>> sys 0.02
>>
>>
>> This is not an artefact of one or other time reporting incorrectly -
>> I'm noticing a long pause before output occurs, but only on the middle
>> test of the three.
>>
>> I can't imagine any sane way for this to be happening, short of
>> coincidence or user error - and I've now seen this sequence 5 times in
>> a row, on 5 different directories created and populated by the same
>> app. (Three times with strace, twice without.)
>>
>>
>> --
>> Arlie
>>
>> (Arlie Stephens                                 arlie at worldash.org)
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies at kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

I agree with Hugo, seems right to send me the output in a file to read
to see if this actually is a bug with ext4.
Regards Nick



More information about the Kernelnewbies mailing list