Work (really slow directory access on ext4)

Nick Krause xerofoify at gmail.com
Wed Aug 6 15:29:14 EDT 2014


On Wed, Aug 6, 2014 at 2:26 PM, Arlie Stephens <arlie at worldash.org> wrote:
> On Aug 06 2014, Theodore Ts'o wrote:
>>
>> I don't subscribe to kernelnewbies, but I came across this thread in
>> the mail archive while researching an unrelated issue.
>>
>> Valdis' observations are on the mark here.  It's almost certain that
>> you are getting overwhelmed with other disk traffic, because your
>> directory isn't *that* big.
>
> Thank you very much. As the user in question, I'm afraid this one
> turns out to be a clear case of "user is an idiot."
>
> I made a dumb mistake in the way I was measuring things. The situation
> on this server is not as bad as it looked.
>
>> That being said, there are certainly issues with really really big
>> directories, and solving this is certainly not going to be a newbie
>> project (if it was easy to solve, it would have been addressed a long
>> time ago).   See:
>>
>> http://en.it-usenet.org/thread/11916/10367/
>
> However, this response is precious. Suddenly a whole bunch of things
> make sense from that posting alone. Last time I looked seriously at
> file system code, it was the Berkeley Fast File System, also known as
> UFS. I've never had time and inclination to look at a modern file
> system. That article managed to straighten out multiple misconceptions
> for me, and point me in good directions.
>
>> for the background.  It's a little bit dated, in that we do use a
>> 64-bit hash on 64-bit systems, but the fundamental issues are still
>> there.
>
> And that's in addition to what you covered here - which includes what
> might be a useful workaround for the application which may or may not
> be hitting a problem that the ls test was intended to simplify. I'm
> passing that on to the app. developer.
>
> Many, many thanks.
>
>> If you sort the readdir files by inode order, this can help
>> significantly.  Some userspace programs, such as mutt, do this.
>> Unfortunately "ls" does not.  (That might be a good newbie project,
>> since it's a userspace-only project.  However, I'm pretty sure the
>> shellutils maintainers will also react negatively if they are sent
>> patches which don't compile.  :-)
>>
>> A proof of concept of how this can be a win can be found here:
>>
>> http://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/tree/contrib/spd_readdir.c
>>
>> LD_PRELOAD aren't guaranteed to work on all programs, so this is much
>> more of a hack than something I'd recommend for extended production
>> use.  But it shows that if you have a readdir+stat workload, sorting
>> by inode makes a huge difference.
>>
>> As far as getting traces to better understand problems, I strongly
>> suggest that you try things like vmstat, iostat, and blktrace; system
>> call traces like strace aren't going to get you very far.  (See
>> http://brooker.co.za/blog/2013/07/14/io-performance.html for a nice
>> introduction to blktrace).  Use the scientific method; collect
>> baseline statistics using vmstat, iostat, sar, before you run your
>> test workload, so you know how much I/O is going on before you start
>> your test.  If you can run your test on a quiscient system, that's a
>> really good idea.  Then collect statistics as your run your workload,
>> and then only tweak one variable at a time, and record everything in a
>> systematic way.
>
> Another tool I didn't know about. Thank you very much.
>>
>> Finally, if you have more problems of a technical nature with respect
>> to the ext4, there is the ext3-users at redhat.com list, or the
>> developer's list at linux-ext4 at vger.kernel.org.  It would be nice if
>> you tried the ext3-users or the kernel-newbies or tried googling to
>> see if anyone else has come across the problem and figured out the
>> solution already, but if you can't figure things out any other way, do
>> feel free to ask the linux-ext4 list.  We won't bite.  :-)
>
> Thank you. I'll make sure to do my homework properly in future - and
> never never believe things senior members of my team tell me without
> verifying them first, at least not if I'm going to post about them :-(
>
>>
>> Cheers,
>>
>>                                               - Ted
>>
>> P.S.  If you have a large number of directories which are much larger
>> than you expect, and you don't want to do the "mkdir foo.new; mv foo/*
>> foo.new ; rmdir foo; mv foo.new foo" trick on a large number of
>> directories, you can also schedule downtime and while the file system
>> is unmounted, use "e2fsck -fD".  See the man page for more details.
>> It won't solve all of your problems, and it might not solve any of
>> your problem, but it will probably make the performance of large
>> directories somewhat better.
>
> Another hint of substantially more value than everything I posted
> about this topic.
>
> Thank you again.
>
> --
> Arlie
>
> (Arlie Stephens                                 arlie at worldash.org)
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

Thanks Ted,
For clearing this up for me seems the issue was not in ext4, and would
you mind ccing me in this conversation
as a learning read.
Regards and Thanks,
Nick



More information about the Kernelnewbies mailing list