Work (really slow directory access on ext4)

Arlie Stephens arlie at worldash.org
Wed Aug 6 14:26:16 EDT 2014


On Aug 06 2014, Theodore Ts'o wrote:
> 
> I don't subscribe to kernelnewbies, but I came across this thread in
> the mail archive while researching an unrelated issue.
> 
> Valdis' observations are on the mark here.  It's almost certain that
> you are getting overwhelmed with other disk traffic, because your
> directory isn't *that* big.

Thank you very much. As the user in question, I'm afraid this one
turns out to be a clear case of "user is an idiot." 

I made a dumb mistake in the way I was measuring things. The situation
on this server is not as bad as it looked. 

> That being said, there are certainly issues with really really big
> directories, and solving this is certainly not going to be a newbie
> project (if it was easy to solve, it would have been addressed a long
> time ago).   See:
> 
> http://en.it-usenet.org/thread/11916/10367/

However, this response is precious. Suddenly a whole bunch of things
make sense from that posting alone. Last time I looked seriously at
file system code, it was the Berkeley Fast File System, also known as
UFS. I've never had time and inclination to look at a modern file
system. That article managed to straighten out multiple misconceptions
for me, and point me in good directions. 

> for the background.  It's a little bit dated, in that we do use a
> 64-bit hash on 64-bit systems, but the fundamental issues are still
> there.

And that's in addition to what you covered here - which includes what
might be a useful workaround for the application which may or may not
be hitting a problem that the ls test was intended to simplify. I'm
passing that on to the app. developer. 

Many, many thanks.  

> If you sort the readdir files by inode order, this can help
> significantly.  Some userspace programs, such as mutt, do this.
> Unfortunately "ls" does not.  (That might be a good newbie project,
> since it's a userspace-only project.  However, I'm pretty sure the
> shellutils maintainers will also react negatively if they are sent
> patches which don't compile.  :-)
> 
> A proof of concept of how this can be a win can be found here:
> 
> http://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/tree/contrib/spd_readdir.c
> 
> LD_PRELOAD aren't guaranteed to work on all programs, so this is much
> more of a hack than something I'd recommend for extended production
> use.  But it shows that if you have a readdir+stat workload, sorting
> by inode makes a huge difference.
> 
> As far as getting traces to better understand problems, I strongly
> suggest that you try things like vmstat, iostat, and blktrace; system
> call traces like strace aren't going to get you very far.  (See
> http://brooker.co.za/blog/2013/07/14/io-performance.html for a nice
> introduction to blktrace).  Use the scientific method; collect
> baseline statistics using vmstat, iostat, sar, before you run your
> test workload, so you know how much I/O is going on before you start
> your test.  If you can run your test on a quiscient system, that's a
> really good idea.  Then collect statistics as your run your workload,
> and then only tweak one variable at a time, and record everything in a
> systematic way.

Another tool I didn't know about. Thank you very much. 
> 
> Finally, if you have more problems of a technical nature with respect
> to the ext4, there is the ext3-users at redhat.com list, or the
> developer's list at linux-ext4 at vger.kernel.org.  It would be nice if
> you tried the ext3-users or the kernel-newbies or tried googling to
> see if anyone else has come across the problem and figured out the
> solution already, but if you can't figure things out any other way, do
> feel free to ask the linux-ext4 list.  We won't bite.  :-)

Thank you. I'll make sure to do my homework properly in future - and
never never believe things senior members of my team tell me without
verifying them first, at least not if I'm going to post about them :-( 

> 
> Cheers,
> 
> 						- Ted
> 
> P.S.  If you have a large number of directories which are much larger
> than you expect, and you don't want to do the "mkdir foo.new; mv foo/*
> foo.new ; rmdir foo; mv foo.new foo" trick on a large number of
> directories, you can also schedule downtime and while the file system
> is unmounted, use "e2fsck -fD".  See the man page for more details.
> It won't solve all of your problems, and it might not solve any of
> your problem, but it will probably make the performance of large
> directories somewhat better.

Another hint of substantially more value than everything I posted
about this topic. 

Thank you again.

-- 
Arlie

(Arlie Stephens					arlie at worldash.org)



More information about the Kernelnewbies mailing list