Overwriting copy functionality in filesystem

Valdis Kl=?utf-8?Q?=c4=93?=tnieks valdis.kletnieks at vt.edu
Thu Mar 28 16:07:02 EDT 2019


On Fri, 29 Mar 2019 00:00:17 +0530, Bharath Vedartham said:

> I was thinking of a use case where we are copying a huge file (say 100
> GB), if we do copy-on-write we can speed up /bin/cp for such files i
> feel. Any comments on this?

Hmm.. wait a minute.  What definition of "copy on write" are you using?

Hint - if you're copying an *entire* 100GB file, the *fastest* way is to simply
make a second hard link to the file. If you're determined to make an entire
second copy, you're going to be reading 100GB and writing 100GB, and the
exact details aren't going to matter all that much.

Now, where you can get clever is if you create your 100GB file, and then
somebody only changes 8K of the file.  There's no need to copy all 100GB into a
new file if you are able to record "oh, and this 8K got changed". You only need
to write the 8K of changes, and some metadata.

(Similar tricks are used for shared libraries and pre-zero'ed storage.  Everybody
gets a reference to the same copy of the page(s) in memory - until somebody
scribbles on a page.

So say you have a 30MB shared object in memory, with 5 users.  That's 5 references
to the same data.  Now one user writes to it.  The system catches that write (usually
via a page fault), copies just the one page to a new page, and then lets the write to the new
page complete.  Now we have 5 users that all have references to the same (30M-4K)
of data, 4 users that have a reference to the old copy of that 4K, and one user that
has a reference to the modified copy of that 4K.

https://en.wikipedia.org/wiki/Copy-on-write



More information about the Kernelnewbies mailing list