Deleting old files is slow and 'kills' IO performance Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election Results Why I closed the “Why is Kali so hard” questionWhy is deleting files by name painfully slow and also exceptionally fast?Deleting large number of filesext4 disk space not reclaimed after deleting filesWhat is the best way to setup 3 HDDs with 3 OS on one computer?Optimize ext4 for always full operationHouskeep old files and dirsAfter reboot files losts and old versions restored!? Ext4Slow server performance - mod_fcgid causing (104), (09) and (32) errors: mod_fcgid: ap_pass_brigade failed in handle_request_ipc functionDeleting kdump files from /bootTLP and graphics card performance

How to deal with a team lead who never gives me credit?

How could we fake a moon landing now?

Amount of permutations on an NxNxN Rubik's Cube

Why aren't air breathing engines used as small first stages?

How to write this math term? with cases it isn't working

Wu formula for manifolds with boundary

Did MS DOS itself ever use blinking text?

Dating a Former Employee

If a VARCHAR(MAX) column is included in an index, is the entire value always stored in the index page(s)?

What font is "z" in "z-score"?

If a contract sometimes uses the wrong name, is it still valid?

Using et al. for a last / senior author rather than for a first author

Why are there no cargo aircraft with "flying wing" design?

2001: A Space Odyssey's use of the song "Daisy Bell" (Bicycle Built for Two); life imitates art or vice-versa?

What do you call the main part of a joke?

How to tell that you are a giant?

What causes the direction of lightning flashes?

Fundamental Solution of the Pell Equation

Can anything be seen from the center of the Boötes void? How dark would it be?

Do jazz musicians improvise on the parent scale in addition to the chord-scales?

Why wasn't DOSKEY integrated with COMMAND.COM?

When a candle burns, why does the top of wick glow if bottom of flame is hottest?

If u is orthogonal to both v and w, and u not equal to 0, argue that u is not in the span of v and w. (

Why are the trig functions versine, haversine, exsecant, etc, rarely used in modern mathematics?



Deleting old files is slow and 'kills' IO performance



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election Results
Why I closed the “Why is Kali so hard” questionWhy is deleting files by name painfully slow and also exceptionally fast?Deleting large number of filesext4 disk space not reclaimed after deleting filesWhat is the best way to setup 3 HDDs with 3 OS on one computer?Optimize ext4 for always full operationHouskeep old files and dirsAfter reboot files losts and old versions restored!? Ext4Slow server performance - mod_fcgid causing (104), (09) and (32) errors: mod_fcgid: ap_pass_brigade failed in handle_request_ipc functionDeleting kdump files from /bootTLP and graphics card performance



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4















I'm using find to prune old files, lots of them.. this takes minutes / hours to run and other server processes encounter IO performance issues.



find -mtime +100 -delete -print


I tried ionice but it didn't appear to help.



ionice -c 3 


What can one do to 1. speed up the find operation and 2. to avoid impacting other processes?
The FS is ext4.. is ext4 just bad at this kind of workload?
Kernel is 3.16
Storage is 2x 1TB 7200rpm HDDs in RAID 1.
There's 93GB in 610228 files now, so 152KB/file on average.



Maybe I just shouldn't store so many files in a single directory?










share|improve this question
























  • Add to the post, how many files, and disk technology, please.

    – Rui F Ribeiro
    Nov 23 '16 at 16:48

















4















I'm using find to prune old files, lots of them.. this takes minutes / hours to run and other server processes encounter IO performance issues.



find -mtime +100 -delete -print


I tried ionice but it didn't appear to help.



ionice -c 3 


What can one do to 1. speed up the find operation and 2. to avoid impacting other processes?
The FS is ext4.. is ext4 just bad at this kind of workload?
Kernel is 3.16
Storage is 2x 1TB 7200rpm HDDs in RAID 1.
There's 93GB in 610228 files now, so 152KB/file on average.



Maybe I just shouldn't store so many files in a single directory?










share|improve this question
























  • Add to the post, how many files, and disk technology, please.

    – Rui F Ribeiro
    Nov 23 '16 at 16:48













4












4








4








I'm using find to prune old files, lots of them.. this takes minutes / hours to run and other server processes encounter IO performance issues.



find -mtime +100 -delete -print


I tried ionice but it didn't appear to help.



ionice -c 3 


What can one do to 1. speed up the find operation and 2. to avoid impacting other processes?
The FS is ext4.. is ext4 just bad at this kind of workload?
Kernel is 3.16
Storage is 2x 1TB 7200rpm HDDs in RAID 1.
There's 93GB in 610228 files now, so 152KB/file on average.



Maybe I just shouldn't store so many files in a single directory?










share|improve this question
















I'm using find to prune old files, lots of them.. this takes minutes / hours to run and other server processes encounter IO performance issues.



find -mtime +100 -delete -print


I tried ionice but it didn't appear to help.



ionice -c 3 


What can one do to 1. speed up the find operation and 2. to avoid impacting other processes?
The FS is ext4.. is ext4 just bad at this kind of workload?
Kernel is 3.16
Storage is 2x 1TB 7200rpm HDDs in RAID 1.
There's 93GB in 610228 files now, so 152KB/file on average.



Maybe I just shouldn't store so many files in a single directory?







linux debian ext4






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '16 at 17:41







XTF

















asked Nov 23 '16 at 15:38









XTFXTF

1212




1212












  • Add to the post, how many files, and disk technology, please.

    – Rui F Ribeiro
    Nov 23 '16 at 16:48

















  • Add to the post, how many files, and disk technology, please.

    – Rui F Ribeiro
    Nov 23 '16 at 16:48
















Add to the post, how many files, and disk technology, please.

– Rui F Ribeiro
Nov 23 '16 at 16:48





Add to the post, how many files, and disk technology, please.

– Rui F Ribeiro
Nov 23 '16 at 16:48










1 Answer
1






active

oldest

votes


















3














When you run the find command like you posted, it will do a rm for each file that it finds. This isn't a good way to do it, in terms of performance.



For improve this task, you can use the -exec option in find for process the output to a rm command:



find -mtime +100 -exec rm +


It's very important the use of the + termination instead the alternate ;. With +, find will only make a rm command for the maximum number of files it can process on a simple execution. With the ; termination, find will do a rm command for each file, so you would have the same problem.



For a better performance, you can join it to the ionice command like you mentioned. If you don't notice that it improves the system performance, most possible is that it is consuming other resources more than I/O, like CPU. For this, you can use renice command to decrease the priority in CPU usage of the process.



I would use the following:



ionice -c 3 find -mtime +100 -exec rm +


Now, in another shell, you need to find the PID of the find command: ps -ef | grep find



And finally run the renice command: renice +19 -p <PID_find_command>






share|improve this answer

























  • why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

    – steve
    Nov 23 '16 at 23:00











  • Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

    – Rubén Alemán
    Nov 24 '16 at 23:01











  • Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

    – XTF
    Nov 29 '16 at 17:56












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f325488%2fdeleting-old-files-is-slow-and-kills-io-performance%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














When you run the find command like you posted, it will do a rm for each file that it finds. This isn't a good way to do it, in terms of performance.



For improve this task, you can use the -exec option in find for process the output to a rm command:



find -mtime +100 -exec rm +


It's very important the use of the + termination instead the alternate ;. With +, find will only make a rm command for the maximum number of files it can process on a simple execution. With the ; termination, find will do a rm command for each file, so you would have the same problem.



For a better performance, you can join it to the ionice command like you mentioned. If you don't notice that it improves the system performance, most possible is that it is consuming other resources more than I/O, like CPU. For this, you can use renice command to decrease the priority in CPU usage of the process.



I would use the following:



ionice -c 3 find -mtime +100 -exec rm +


Now, in another shell, you need to find the PID of the find command: ps -ef | grep find



And finally run the renice command: renice +19 -p <PID_find_command>






share|improve this answer

























  • why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

    – steve
    Nov 23 '16 at 23:00











  • Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

    – Rubén Alemán
    Nov 24 '16 at 23:01











  • Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

    – XTF
    Nov 29 '16 at 17:56
















3














When you run the find command like you posted, it will do a rm for each file that it finds. This isn't a good way to do it, in terms of performance.



For improve this task, you can use the -exec option in find for process the output to a rm command:



find -mtime +100 -exec rm +


It's very important the use of the + termination instead the alternate ;. With +, find will only make a rm command for the maximum number of files it can process on a simple execution. With the ; termination, find will do a rm command for each file, so you would have the same problem.



For a better performance, you can join it to the ionice command like you mentioned. If you don't notice that it improves the system performance, most possible is that it is consuming other resources more than I/O, like CPU. For this, you can use renice command to decrease the priority in CPU usage of the process.



I would use the following:



ionice -c 3 find -mtime +100 -exec rm +


Now, in another shell, you need to find the PID of the find command: ps -ef | grep find



And finally run the renice command: renice +19 -p <PID_find_command>






share|improve this answer

























  • why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

    – steve
    Nov 23 '16 at 23:00











  • Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

    – Rubén Alemán
    Nov 24 '16 at 23:01











  • Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

    – XTF
    Nov 29 '16 at 17:56














3












3








3







When you run the find command like you posted, it will do a rm for each file that it finds. This isn't a good way to do it, in terms of performance.



For improve this task, you can use the -exec option in find for process the output to a rm command:



find -mtime +100 -exec rm +


It's very important the use of the + termination instead the alternate ;. With +, find will only make a rm command for the maximum number of files it can process on a simple execution. With the ; termination, find will do a rm command for each file, so you would have the same problem.



For a better performance, you can join it to the ionice command like you mentioned. If you don't notice that it improves the system performance, most possible is that it is consuming other resources more than I/O, like CPU. For this, you can use renice command to decrease the priority in CPU usage of the process.



I would use the following:



ionice -c 3 find -mtime +100 -exec rm +


Now, in another shell, you need to find the PID of the find command: ps -ef | grep find



And finally run the renice command: renice +19 -p <PID_find_command>






share|improve this answer















When you run the find command like you posted, it will do a rm for each file that it finds. This isn't a good way to do it, in terms of performance.



For improve this task, you can use the -exec option in find for process the output to a rm command:



find -mtime +100 -exec rm +


It's very important the use of the + termination instead the alternate ;. With +, find will only make a rm command for the maximum number of files it can process on a simple execution. With the ; termination, find will do a rm command for each file, so you would have the same problem.



For a better performance, you can join it to the ionice command like you mentioned. If you don't notice that it improves the system performance, most possible is that it is consuming other resources more than I/O, like CPU. For this, you can use renice command to decrease the priority in CPU usage of the process.



I would use the following:



ionice -c 3 find -mtime +100 -exec rm +


Now, in another shell, you need to find the PID of the find command: ps -ef | grep find



And finally run the renice command: renice +19 -p <PID_find_command>







share|improve this answer














share|improve this answer



share|improve this answer








edited 11 hours ago









Rui F Ribeiro

42.1k1484142




42.1k1484142










answered Nov 23 '16 at 21:25









Rubén AlemánRubén Alemán

944




944












  • why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

    – steve
    Nov 23 '16 at 23:00











  • Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

    – Rubén Alemán
    Nov 24 '16 at 23:01











  • Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

    – XTF
    Nov 29 '16 at 17:56


















  • why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

    – steve
    Nov 23 '16 at 23:00











  • Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

    – Rubén Alemán
    Nov 24 '16 at 23:01











  • Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

    – XTF
    Nov 29 '16 at 17:56

















why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

– steve
Nov 23 '16 at 23:00





why not use xargs so that each spawn of a rm process deletes, say, 50 files at a time ?

– steve
Nov 23 '16 at 23:00













Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

– Rubén Alemán
Nov 24 '16 at 23:01





Using xargs will not work with files with blank spaces in the filename. You need to pay attention to build the comand with -print0 option that process the blank spaces, something like find -mtime +100 -print0 | xargs -0 rm. This complicates the execution of the command and will not work out of the box with the I/O performance modification in one single command, like in the example of ionice with find and -exec. Also, using xargs doesn't offer a better performance than using find with -exec and + termination, so I prefer this last option.

– Rubén Alemán
Nov 24 '16 at 23:01













Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

– XTF
Nov 29 '16 at 17:56






Does -exec take care of spaces and other weird stuff? As we're IO bound, how does using rm improve performance?

– XTF
Nov 29 '16 at 17:56


















draft saved

draft discarded
















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f325488%2fdeleting-old-files-is-slow-and-kills-io-performance%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







-debian, ext4, linux

Popular posts from this blog

Creating 100m^2 grid automatically using QGIS?Creating grid constrained within polygon in QGIS?Createing polygon layer from point data using QGIS?Creating vector grid using QGIS?Creating grid polygons from coordinates using R or PythonCreating grid from spatio temporal point data?Creating fields in attributes table using other layers using QGISCreate .shp vector grid in QGISQGIS Creating 4km point grid within polygonsCreate a vector grid over a raster layerVector Grid Creates just one grid

Nikolai Prilezhaev Bibliography References External links Navigation menuEarly Russian Organic Chemists and Their Legacy092774english translationRussian Biography

How to link a C library to an Assembly library on Mac with clangHow do you set, clear, and toggle a single bit?Find (and kill) process locking port 3000 on MacWho is listening on a given TCP port on Mac OS X?How to start PostgreSQL server on Mac OS X?Compile assembler in nasm on mac osHow do I install pip on macOS or OS X?AFNetworking 2.0 “_NSURLSessionTransferSizeUnknown” linking error on Mac OS X 10.8C++ code for testing the Collatz conjecture faster than hand-written assembly - why?How to link a NASM code and GCC in Mac OS X?How to run x86 .asm on macOS Sierra