Work (really slow directory access on ext4)

Henry Hallam henry at pericynthion.org
Thu Jul 31 19:41:57 EDT 2014


Try redirecting the ls output to /dev/null or a file, thus disabling
its color highlighting and thus removing a bunch of syscalls.  See if
it's now the same no matter what choice of 'time'.

On Thu, Jul 31, 2014 at 4:36 PM, Arlie Stephens <arlie at worldash.org> wrote:
> Hi Nick,
>
> [Context - directory ls taking 4-15 seconds; directory large, with
> long filenames, but nowhere near as huge as Valdis' mail directory.]
>
> I've now discovered a really bizarre pattern, and I'm inclined to stop
> blaming the file system until some clarity develops. If I ever get it
> to the point where I can produce a high quality bug report - with or
> without patch - I will do so - but what I have now is anything but
> clear and high quality.
>
> On Jul 30 2014, Nick Krause wrote:
>> On Wed, Jul 30, 2014 at 3:48 PM,  <Valdis.Kletnieks at vt.edu> wrote:
>> > On Wed, 30 Jul 2014 10:38:13 -0700, Arlie Stephens said:
>> >
>> >> On the good side, Vladis' observations of his mail directory have been
>> >> a great help.
>> >
>> > And remember, that's on a single laptop-class hard drive, no fancy raid or
>> > anything. (Though it *is* a hybrid, with 32G of flash cache on the front end).
>> >
>> > You throw some *real* hardware at it, it of course would go even faster.
>>
>> Just send me the logs and anything else you think may help me.
>> Please note cc the ext4 mailing list as this will also let the other
>> ext4 developers and maintainers known about your problem.
>> Cheers Nick
>
> I'm now in a state of complete bafflement.
>
> It turns out we have a whole collection of misbehaving directories,
> making this testable without waiting for caches to clear.
>
> I have a couple of strace's of fast ls's, and a function ftrace that
> captured about half of a 7 second ls. (The latter is huge, and
> probably not suitable for posting.)
>
> I also have a really bizarre observation, the kind that makes you
> wonder whether you are actually dreaming. It appears that the
> misbehaviour is strongly influenced by the choice of "time" function.
> The problem only occurs when using the shell built-in. /usr/bin/time
> always produces a fast response.
>
> Stranger still - flat out impossible, I'd have said before seeing it -
> a "fast" ls, run with /usr/bin/time can be followed *immediately*
> by a slow "ls", run with bash' time. It's as if the first one doesn't
> warm the cache, which is completely absurd - except I've been able to
> make this happen 5 times in a row, first with strace and then
> without.
>
> # with /usr/bin/time the ls is fast
> $ time -p ls bad_dir
> ...
> real 0.21
> user 0.00
> sys 0.00
>
>
> # with the builtin time, right *after* the strace run, the time can be
> # horrible.
> $ time -p ls bad_dir
> ...
> real 5.60
> user 0.00
> sys 0.17
>
> # run it again, and the directory is in cache as expected.
> $ time -p ls bad_dir
> ...
> real 0.11
> user 0.00
> sys 0.02
>
>
> This is not an artefact of one or other time reporting incorrectly -
> I'm noticing a long pause before output occurs, but only on the middle
> test of the three.
>
> I can't imagine any sane way for this to be happening, short of
> coincidence or user error - and I've now seen this sequence 5 times in
> a row, on 5 different directories created and populated by the same
> app. (Three times with strace, twice without.)
>
>
> --
> Arlie
>
> (Arlie Stephens                                 arlie at worldash.org)
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies



More information about the Kernelnewbies mailing list