Huge performance difference of the command find with and without using %M option to show permissions The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election ResultsPermissions for making some some (but not all) files visible directly under a directoryThe relationship between execute permission on a directory and its inode structureFile inheriting permission of directory it is copied in?python vs bc in evaluating 6^6^6Why does find -inum iterate through the whole filesystem tree?Why does chmod succeed on a file when the user does not have write permission on parent directory?Find files with group permissions more restrictive than owner permissionsIs it possible to run ls or find and pipe it through stat?KVM guest I/O hangs randomly“permission denied” when appending with echo, but working with vi
How is simplicity better than precision and clarity in prose?
How to split my screen on my Macbook Air?
Sort a list of pairs representing an acyclic, partial automorphism
Does Parliament need to approve the new Brexit delay to 31 October 2019?
Why can't wing-mounted spoilers be used to steepen approaches?
how can a perfect fourth interval be considered either consonant or dissonant?
Match Roman Numerals
How to pronounce 1ターン?
Can a 1st-level character have an ability score above 18?
Did God make two great lights or did He make the great light two?
Is there a writing software that you can sort scenes like slides in PowerPoint?
What can I do if neighbor is blocking my solar panels intentionally?
Is it ethical to upload a automatically generated paper to a non peer-reviewed site as part of a larger research?
How many people can fit inside Mordenkainen's Magnificent Mansion?
Python - Fishing Simulator
does high air pressure throw off wheel balance?
What aspect of planet Earth must be changed to prevent the industrial revolution?
How to copy the contents of all files with a certain name into a new file?
Is every episode of "Where are my Pants?" identical?
Relations between two reciprocal partial derivatives?
Would an alien lifeform be able to achieve space travel if lacking in vision?
Cooking pasta in a water boiler
How should I replace vector<uint8_t>::const_iterator in an API?
When did F become S in typeography, and why?
Huge performance difference of the command find with and without using %M option to show permissions
The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election ResultsPermissions for making some some (but not all) files visible directly under a directoryThe relationship between execute permission on a directory and its inode structureFile inheriting permission of directory it is copied in?python vs bc in evaluating 6^6^6Why does find -inum iterate through the whole filesystem tree?Why does chmod succeed on a file when the user does not have write permission on parent directory?Find files with group permissions more restrictive than owner permissionsIs it possible to run ls or find and pipe it through stat?KVM guest I/O hangs randomly“permission denied” when appending with echo, but working with vi
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:
for i in 1..3000000; do echo $i>$i; done;
I am using the command find
to write the information about files in this directory into a file. This works surprisingly fast:
$ time find many_files -printf '%i %y %pn'>info_file
real 0m6.970s
user 0m3.812s
sys 0m0.904s
Now if I add %M
to get the permissions:
$ time find many_files -printf '%i %y %M %pn'>info_file
real 2m30.677s
user 0m5.148s
sys 0m37.338s
The command takes much longer. This is very surprising to me, since in a C program we can use struct stat
to get inode and permission information of a file and in the kernel the struct inode
saves both these information.
My Questions:
- What causes this behavior?
- Is there a faster way to get file permissions for so many files?
linux files permissions find performance
add a comment |
On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:
for i in 1..3000000; do echo $i>$i; done;
I am using the command find
to write the information about files in this directory into a file. This works surprisingly fast:
$ time find many_files -printf '%i %y %pn'>info_file
real 0m6.970s
user 0m3.812s
sys 0m0.904s
Now if I add %M
to get the permissions:
$ time find many_files -printf '%i %y %M %pn'>info_file
real 2m30.677s
user 0m5.148s
sys 0m37.338s
The command takes much longer. This is very surprising to me, since in a C program we can use struct stat
to get inode and permission information of a file and in the kernel the struct inode
saves both these information.
My Questions:
- What causes this behavior?
- Is there a faster way to get file permissions for so many files?
linux files permissions find performance
The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use-perm
withfind
to pick out the files with the permissions you're looking for.
– Kusalananda♦
yesterday
@Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided?find -perm
will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?
– ilkkachu
16 hours ago
@ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.
– Kusalananda♦
16 hours ago
add a comment |
On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:
for i in 1..3000000; do echo $i>$i; done;
I am using the command find
to write the information about files in this directory into a file. This works surprisingly fast:
$ time find many_files -printf '%i %y %pn'>info_file
real 0m6.970s
user 0m3.812s
sys 0m0.904s
Now if I add %M
to get the permissions:
$ time find many_files -printf '%i %y %M %pn'>info_file
real 2m30.677s
user 0m5.148s
sys 0m37.338s
The command takes much longer. This is very surprising to me, since in a C program we can use struct stat
to get inode and permission information of a file and in the kernel the struct inode
saves both these information.
My Questions:
- What causes this behavior?
- Is there a faster way to get file permissions for so many files?
linux files permissions find performance
On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by running:
for i in 1..3000000; do echo $i>$i; done;
I am using the command find
to write the information about files in this directory into a file. This works surprisingly fast:
$ time find many_files -printf '%i %y %pn'>info_file
real 0m6.970s
user 0m3.812s
sys 0m0.904s
Now if I add %M
to get the permissions:
$ time find many_files -printf '%i %y %M %pn'>info_file
real 2m30.677s
user 0m5.148s
sys 0m37.338s
The command takes much longer. This is very surprising to me, since in a C program we can use struct stat
to get inode and permission information of a file and in the kernel the struct inode
saves both these information.
My Questions:
- What causes this behavior?
- Is there a faster way to get file permissions for so many files?
linux files permissions find performance
linux files permissions find performance
edited yesterday
Jeff Schaller♦
45k1164147
45k1164147
asked yesterday
BahramBahram
334
334
The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use-perm
withfind
to pick out the files with the permissions you're looking for.
– Kusalananda♦
yesterday
@Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided?find -perm
will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?
– ilkkachu
16 hours ago
@ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.
– Kusalananda♦
16 hours ago
add a comment |
The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use-perm
withfind
to pick out the files with the permissions you're looking for.
– Kusalananda♦
yesterday
@Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided?find -perm
will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?
– ilkkachu
16 hours ago
@ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.
– Kusalananda♦
16 hours ago
The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use
-perm
with find
to pick out the files with the permissions you're looking for.– Kusalananda♦
yesterday
The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use
-perm
with find
to pick out the files with the permissions you're looking for.– Kusalananda♦
yesterday
@Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided?
find -perm
will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?– ilkkachu
16 hours ago
@Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided?
find -perm
will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?– ilkkachu
16 hours ago
@ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.
– Kusalananda♦
16 hours ago
@ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.
– Kusalananda♦
16 hours ago
add a comment |
1 Answer
1
active
oldest
votes
The first version requires only to readdir(3)
/getdents(2)
the directory, when run on a filesystem supporting this feature (ext4: filetype
feature displayed with tune2fs -l /dev/xxx
, xfs: ftype=1
displayed with xfs_info /mount/point
...).
The second version in addition also requires to stat(2)
each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat
is not required when looking only for name, inode and filetype because the directory entry is enough:
The linux_dirent structure is declared as follows:
struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/
the same informations are available to readdir(3)
:
struct dirent
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;
Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:
strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file
Which on my Linux amd64 kernel 5.0.x just shows as main difference:
[...]
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
[...]
+newfstatat(5, "891", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
[...]
Unfortunately, thed_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).
– mosvy
yesterday
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
yesterday
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
yesterday
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
yesterday
I think it's supported on xfs -- when I was making a testcase for a glibcglob(3)
that only triggered when thed_type
field was absent, I had to use either minixfs or use theGLOB_ALTDIRFUNC
.
– mosvy
yesterday
|
show 2 more comments
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f512167%2fhuge-performance-difference-of-the-command-find-with-and-without-using-m-option%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The first version requires only to readdir(3)
/getdents(2)
the directory, when run on a filesystem supporting this feature (ext4: filetype
feature displayed with tune2fs -l /dev/xxx
, xfs: ftype=1
displayed with xfs_info /mount/point
...).
The second version in addition also requires to stat(2)
each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat
is not required when looking only for name, inode and filetype because the directory entry is enough:
The linux_dirent structure is declared as follows:
struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/
the same informations are available to readdir(3)
:
struct dirent
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;
Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:
strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file
Which on my Linux amd64 kernel 5.0.x just shows as main difference:
[...]
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
[...]
+newfstatat(5, "891", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
[...]
Unfortunately, thed_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).
– mosvy
yesterday
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
yesterday
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
yesterday
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
yesterday
I think it's supported on xfs -- when I was making a testcase for a glibcglob(3)
that only triggered when thed_type
field was absent, I had to use either minixfs or use theGLOB_ALTDIRFUNC
.
– mosvy
yesterday
|
show 2 more comments
The first version requires only to readdir(3)
/getdents(2)
the directory, when run on a filesystem supporting this feature (ext4: filetype
feature displayed with tune2fs -l /dev/xxx
, xfs: ftype=1
displayed with xfs_info /mount/point
...).
The second version in addition also requires to stat(2)
each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat
is not required when looking only for name, inode and filetype because the directory entry is enough:
The linux_dirent structure is declared as follows:
struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/
the same informations are available to readdir(3)
:
struct dirent
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;
Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:
strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file
Which on my Linux amd64 kernel 5.0.x just shows as main difference:
[...]
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
[...]
+newfstatat(5, "891", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
[...]
Unfortunately, thed_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).
– mosvy
yesterday
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
yesterday
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
yesterday
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
yesterday
I think it's supported on xfs -- when I was making a testcase for a glibcglob(3)
that only triggered when thed_type
field was absent, I had to use either minixfs or use theGLOB_ALTDIRFUNC
.
– mosvy
yesterday
|
show 2 more comments
The first version requires only to readdir(3)
/getdents(2)
the directory, when run on a filesystem supporting this feature (ext4: filetype
feature displayed with tune2fs -l /dev/xxx
, xfs: ftype=1
displayed with xfs_info /mount/point
...).
The second version in addition also requires to stat(2)
each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat
is not required when looking only for name, inode and filetype because the directory entry is enough:
The linux_dirent structure is declared as follows:
struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/
the same informations are available to readdir(3)
:
struct dirent
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;
Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:
strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file
Which on my Linux amd64 kernel 5.0.x just shows as main difference:
[...]
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
[...]
+newfstatat(5, "891", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
[...]
The first version requires only to readdir(3)
/getdents(2)
the directory, when run on a filesystem supporting this feature (ext4: filetype
feature displayed with tune2fs -l /dev/xxx
, xfs: ftype=1
displayed with xfs_info /mount/point
...).
The second version in addition also requires to stat(2)
each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat
is not required when looking only for name, inode and filetype because the directory entry is enough:
The linux_dirent structure is declared as follows:
struct linux_dirent
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/
the same informations are available to readdir(3)
:
struct dirent
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
;
Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:
strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file
Which on my Linux amd64 kernel 5.0.x just shows as main difference:
[...]
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", st_mode=S_IFREG, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
[...]
+newfstatat(5, "891", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", 0644, st_size=4, ..., AT_SYMLINK_NOFOLLOW) = 0
[...]
edited yesterday
answered yesterday
A.BA.B
6,14711131
6,14711131
Unfortunately, thed_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).
– mosvy
yesterday
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
yesterday
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
yesterday
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
yesterday
I think it's supported on xfs -- when I was making a testcase for a glibcglob(3)
that only triggered when thed_type
field was absent, I had to use either minixfs or use theGLOB_ALTDIRFUNC
.
– mosvy
yesterday
|
show 2 more comments
Unfortunately, thed_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).
– mosvy
yesterday
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
yesterday
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
yesterday
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
yesterday
I think it's supported on xfs -- when I was making a testcase for a glibcglob(3)
that only triggered when thed_type
field was absent, I had to use either minixfs or use theGLOB_ALTDIRFUNC
.
– mosvy
yesterday
Unfortunately, the
d_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).– mosvy
yesterday
Unfortunately, the
d_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).– mosvy
yesterday
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
yesterday
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
yesterday
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
yesterday
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
yesterday
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
yesterday
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
yesterday
I think it's supported on xfs -- when I was making a testcase for a glibc
glob(3)
that only triggered when the d_type
field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC
.– mosvy
yesterday
I think it's supported on xfs -- when I was making a testcase for a glibc
glob(3)
that only triggered when the d_type
field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC
.– mosvy
yesterday
|
show 2 more comments
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f512167%2fhuge-performance-difference-of-the-command-find-with-and-without-using-m-option%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
-files, find, linux, performance, permissions
The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use
-perm
withfind
to pick out the files with the permissions you're looking for.– Kusalananda♦
yesterday
@Kusalananda, Why is it wrong to ask that? If you're faced with an unexpected 20x slowdown, then surely you want to know if it can be avoided?
find -perm
will still need to look at the permissions, even if not output them, so would using it affect the slowdown in any way?– ilkkachu
16 hours ago
@ilkkachu You are correct. I assumed that the slowdown was due to the extra data produced, just like 0xSheepdog initially thought (which seems to not be the case). I would still not want to get the permissions as text like that if the intention is to process the files based on the permissions though.
– Kusalananda♦
16 hours ago