Deleting old files is slow and 'kills' IO performance Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election Results Why I closed the “Why is Kali so hard” questionWhy is deleting files by name painfully slow and also exceptionally fast?Deleting large number of filesext4 disk space not reclaimed after deleting filesWhat is the best way to setup 3 HDDs with 3 OS on one computer?Optimize ext4 for always full operationHouskeep old files and dirsAfter reboot files losts and old versions restored!? Ext4Slow server performance - mod_fcgid causing (104), (09) and (32) errors: mod_fcgid: ap_pass_brigade failed in handle_request_ipc functionDeleting kdump files from /bootTLP and graphics card performance

How to deal with a team lead who never gives me credit?

How could we fake a moon landing now?

Amount of permutations on an NxNxN Rubik's Cube

Why aren't air breathing engines used as small first stages?

How to write this math term? with cases it isn't working

Wu formula for manifolds with boundary

Did MS DOS itself ever use blinking text?

Dating a Former Employee

If a VARCHAR(MAX) column is included in an index, is the entire value always stored in the index page(s)?

What font is "z" in "z-score"?

If a contract sometimes uses the wrong name, is it still valid?

Using et al. for a last / senior author rather than for a first author

Why are there no cargo aircraft with "flying wing" design?

2001: A Space Odyssey's use of the song "Daisy Bell" (Bicycle Built for Two); life imitates art or vice-versa?

What do you call the main part of a joke?

How to tell that you are a giant?

What causes the direction of lightning flashes?

Fundamental Solution of the Pell Equation

Can anything be seen from the center of the Boötes void? How dark would it be?

Do jazz musicians improvise on the parent scale in addition to the chord-scales?

Why wasn't DOSKEY integrated with COMMAND.COM?

When a candle burns, why does the top of wick glow if bottom of flame is hottest?

If u is orthogonal to both v and w, and u not equal to 0, argue that u is not in the span of v and w. (

Why are the trig functions versine, haversine, exsecant, etc, rarely used in modern mathematics?



Deleting old files is slow and 'kills' IO performance



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election Results
Why I closed the “Why is Kali so hard” questionWhy is deleting files by name painfully slow and also exceptionally fast?Deleting large number of filesext4 disk space not reclaimed after deleting filesWhat is the best way to setup 3 HDDs with 3 OS on one computer?Optimize ext4 for always full operationHouskeep old files and dirsAfter reboot files losts and old versions restored!? Ext4Slow server performance - mod_fcgid causing (104), (09) and (32) errors: mod_fcgid: ap_pass_brigade failed in handle_request_ipc functionDeleting kdump files from /bootTLP and graphics card performance



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4















I'm using find to prune old files, lots of them.. this takes minutes / hours to run and other server processes encounter IO performance issues.



find -mtime +100 -delete -print


I tried ionice but it didn't appear to help.



ionice -c 3 


What can one do to 1. speed up the find operation and 2. to avoid impacting other processes?
The FS is ext4.. is ext4 just bad at this kind of workload?
Kernel is 3.16
Storage is 2x 1TB 7200rpm HDDs in RAID 1.
There's 93GB in 610228 files now, so 152KB/file on average.



Maybe I just shouldn't store so many files in a single directory?










share|improve this question
























  • Add to the post, how many files, and disk technology, please.

    – Rui F Ribeiro
    Nov 23 '16 at 16:48

















4















I'm using find to prune old files, lots of them.. this takes minutes / hours to run and other server processes encounter IO performance issues.



find -mtime +100 -delete -print


I tried ionice but it didn't appear to help.



ionice -c 3 


What can one do to 1. speed up the find operation and 2. to avoid impacting other processes?
The FS is ext4.. is ext4 just bad at this kind of workload?
Kernel is 3.16
Storage is 2x 1TB 7200rpm HDDs in RAID 1.
There's 93GB in 610228 files now, so 152KB/file on average.



Maybe I just shouldn't store so many files in a single directory?










share|improve this question
























  • Add to the post, how many files, and disk technology, please.

    – Rui F Ribeiro
    Nov 23 '16 at 16:48













4












4








4








I'm using find to prune old files, lots of them.. this takes minutes / hours to run and other server processes encounter IO performance issues.



find -mtime +100 -delete -print


I tried ionice but it didn't appear to help.



ionice -c 3 


What can one do to 1. speed up the find operation and 2. to avoid impacting other processes?
The FS is ext4.. is ext4 just bad at this kind of workload?
Kernel is 3.16
Storage is 2x 1TB 7200rpm HDDs in RAID 1.
There's 93GB in 610228 files now, so 152KB/file on average.



Maybe I just shouldn't store so many files in a single directory?










share|improve this question
















I'm using find to prune old files, lots of them.. this takes minutes / hours to run and other server processes encounter IO performance issues.



find -mtime +100 -delete -print


I tried ionice but it didn't appear to help.



ionice -c 3 


What can one do to 1. speed up the find operation and 2. to avoid impacting other processes?
The FS is ext4.. is ext4 just bad at this kind of workload?
Kernel is 3.16
Storage is 2x 1TB 7200rpm HDDs in RAID 1.
There's 93GB in 610228 files now, so 152KB/file on average.



Maybe I just shouldn't store so many files in a single directory?







linux debian ext4






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '16 at 17:41







XTF

















asked Nov 23 '16 at 15:38









XTFXTF

1212




1212












  • Add to the post, how many files, and disk technology, please.

    – Rui F Ribeiro
    Nov 23 '16 at 16:48

















  • Add to the post, how many files, and disk technology, please.

    – Rui F Ribeiro
    Nov 23 '16 at 16:48
















Add to the post, how many files, and disk technology, please.

– Rui F Ribeiro
Nov 23 '16 at 16:48





Add to the post, how many files, and disk technology, please.

– Rui F Ribeiro
Nov 23 '16 at 16:48










1 Answer
1






active

oldest

votes


















3














When you run the find command like you posted, it will do a rm for each file that it finds. This isn't a good way to do it, in terms of performance.



For improve this task, you can use the -exec option in find for process the output to a rm command:



find -mtime +100 -exec rm +


It's very important the use of the + termination instead the alternate ;. With +, find will only make a rm command for the maximum number of files it can process on a simple execution. With the ; termination, find will do a rm command for each file, so you would have the same problem.



For a better performance, you can join it to the ionice command like you mentioned. If you don't notice that it improves the system performance, most possible is that it is consuming other resources more than I/O, like CPU. For this, you can use renice command to decrease the priority in CPU usage of the process.



I would use the following:



ionice -c 3 find -mtime +100 -exec rm +


Now, in another shell, you need to find the PID of the find command: ps -ef | grep find



And finally run the renice command: renice +19 -p <PID_find_command>






share|improve this answer

























  • why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

    – steve
    Nov 23 '16 at 23:00











  • Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

    – Rubén Alemán
    Nov 24 '16 at 23:01











  • Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

    – XTF
    Nov 29 '16 at 17:56












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f325488%2fdeleting-old-files-is-slow-and-kills-io-performance%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














When you run the find command like you posted, it will do a rm for each file that it finds. This isn't a good way to do it, in terms of performance.



For improve this task, you can use the -exec option in find for process the output to a rm command:



find -mtime +100 -exec rm +


It's very important the use of the + termination instead the alternate ;. With +, find will only make a rm command for the maximum number of files it can process on a simple execution. With the ; termination, find will do a rm command for each file, so you would have the same problem.



For a better performance, you can join it to the ionice command like you mentioned. If you don't notice that it improves the system performance, most possible is that it is consuming other resources more than I/O, like CPU. For this, you can use renice command to decrease the priority in CPU usage of the process.



I would use the following:



ionice -c 3 find -mtime +100 -exec rm +


Now, in another shell, you need to find the PID of the find command: ps -ef | grep find



And finally run the renice command: renice +19 -p <PID_find_command>






share|improve this answer

























  • why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

    – steve
    Nov 23 '16 at 23:00











  • Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

    – Rubén Alemán
    Nov 24 '16 at 23:01











  • Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

    – XTF
    Nov 29 '16 at 17:56
















3














When you run the find command like you posted, it will do a rm for each file that it finds. This isn't a good way to do it, in terms of performance.



For improve this task, you can use the -exec option in find for process the output to a rm command:



find -mtime +100 -exec rm +


It's very important the use of the + termination instead the alternate ;. With +, find will only make a rm command for the maximum number of files it can process on a simple execution. With the ; termination, find will do a rm command for each file, so you would have the same problem.



For a better performance, you can join it to the ionice command like you mentioned. If you don't notice that it improves the system performance, most possible is that it is consuming other resources more than I/O, like CPU. For this, you can use renice command to decrease the priority in CPU usage of the process.



I would use the following:



ionice -c 3 find -mtime +100 -exec rm +


Now, in another shell, you need to find the PID of the find command: ps -ef | grep find



And finally run the renice command: renice +19 -p <PID_find_command>






share|improve this answer

























  • why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

    – steve
    Nov 23 '16 at 23:00











  • Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

    – Rubén Alemán
    Nov 24 '16 at 23:01











  • Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

    – XTF
    Nov 29 '16 at 17:56














3












3








3







When you run the find command like you posted, it will do a rm for each file that it finds. This isn't a good way to do it, in terms of performance.



For improve this task, you can use the -exec option in find for process the output to a rm command:



find -mtime +100 -exec rm +


It's very important the use of the + termination instead the alternate ;. With +, find will only make a rm command for the maximum number of files it can process on a simple execution. With the ; termination, find will do a rm command for each file, so you would have the same problem.



For a better performance, you can join it to the ionice command like you mentioned. If you don't notice that it improves the system performance, most possible is that it is consuming other resources more than I/O, like CPU. For this, you can use renice command to decrease the priority in CPU usage of the process.



I would use the following:



ionice -c 3 find -mtime +100 -exec rm +


Now, in another shell, you need to find the PID of the find command: ps -ef | grep find



And finally run the renice command: renice +19 -p <PID_find_command>






share|improve this answer















When you run the find command like you posted, it will do a rm for each file that it finds. This isn't a good way to do it, in terms of performance.



For improve this task, you can use the -exec option in find for process the output to a rm command:



find -mtime +100 -exec rm +


It's very important the use of the + termination instead the alternate ;. With +, find will only make a rm command for the maximum number of files it can process on a simple execution. With the ; termination, find will do a rm command for each file, so you would have the same problem.



For a better performance, you can join it to the ionice command like you mentioned. If you don't notice that it improves the system performance, most possible is that it is consuming other resources more than I/O, like CPU. For this, you can use renice command to decrease the priority in CPU usage of the process.



I would use the following:



ionice -c 3 find -mtime +100 -exec rm +


Now, in another shell, you need to find the PID of the find command: ps -ef | grep find



And finally run the renice command: renice +19 -p <PID_find_command>







share|improve this answer














share|improve this answer



share|improve this answer








edited 11 hours ago









Rui F Ribeiro

42.1k1484142




42.1k1484142










answered Nov 23 '16 at 21:25









Rubén AlemánRubén Alemán

944




944












  • why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

    – steve
    Nov 23 '16 at 23:00











  • Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

    – Rubén Alemán
    Nov 24 '16 at 23:01











  • Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

    – XTF
    Nov 29 '16 at 17:56


















  • why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

    – steve
    Nov 23 '16 at 23:00











  • Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

    – Rubén Alemán
    Nov 24 '16 at 23:01











  • Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

    – XTF
    Nov 29 '16 at 17:56

















why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

– steve
Nov 23 '16 at 23:00





why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

– steve
Nov 23 '16 at 23:00













Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

– Rubén Alemán
Nov 24 '16 at 23:01





Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

– Rubén Alemán
Nov 24 '16 at 23:01













Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

– XTF
Nov 29 '16 at 17:56






Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

– XTF
Nov 29 '16 at 17:56


















draft saved

draft discarded
















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f325488%2fdeleting-old-files-is-slow-and-kills-io-performance%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







-debian, ext4, linux

Popular posts from this blog

Frič See also Navigation menuinternal link

Identify plant with long narrow paired leaves and reddish stems Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?What is this plant with long sharp leaves? Is it a weed?What is this 3ft high, stalky plant, with mid sized narrow leaves?What is this young shrub with opposite ovate, crenate leaves and reddish stems?What is this plant with large broad serrated leaves?Identify this upright branching weed with long leaves and reddish stemsPlease help me identify this bulbous plant with long, broad leaves and white flowersWhat is this small annual with narrow gray/green leaves and rust colored daisy-type flowers?What is this chilli plant?Does anyone know what type of chilli plant this is?Help identify this plant

fontconfig warning: “/etc/fonts/fonts.conf”, line 100: unknown “element blank” The 2019 Stack Overflow Developer Survey Results Are In“tar: unrecognized option --warning” during 'apt-get install'How to fix Fontconfig errorHow do I figure out which font file is chosen for a system generic font alias?Why are some apt-get-installed fonts being ignored by fc-list, xfontsel, etc?Reload settings in /etc/fonts/conf.dTaking 30 seconds longer to boot after upgrade from jessie to stretchHow to match multiple font names with a single <match> element?Adding a custom font to fontconfigRemoving fonts from fontconfig <match> resultsBroken fonts after upgrading Firefox ESR to latest Firefox