What's the POSIX way to read an exact number of bytes from a file?2019 Community Moderator ElectionCreate random data with dd and get “partial read warning”. Is the data after the warning now really random?Can I read a single character from stdin in POSIX shell?Can't clone hard disk with dd no space leftBest way to remove bytes from the start of a file?Shell: How to read the bytes of a binary file and print as hexadecimal?Read the middle of a large fileextract number of bytes out of a filewhy does it take so long to read the top few lines of my file?Does POSIX limit the number of directories in the os root?What is the portable (POSIX) way to achieve process substitution?Remove null bytes from the end of a large fileDoes POSIX standardize the file descriptor numbers?What's the right way to base64 encode a binary file on CentOS 7?

Calculating the number of days between 2 dates in Excel

Pronouncing Homer as in modern Greek

Latex for-and in equation

Who must act to prevent Brexit on March 29th?

Superhero words!

How can I successfully establish a nationwide combat training program for a large country?

What is the opposite of 'gravitas'?

How can I raise concerns with a new DM about XP splitting?

Would it be legal for a US State to ban exports of a natural resource?

Invariance of results when scaling explanatory variables in logistic regression, is there a proof?

Is there an wasy way to program in Tikz something like the one in the image?

word describing multiple paths to the same abstract outcome

Simple image editor tool to draw a simple box/rectangle in an existing image

Partial sums of primes

Teaching indefinite integrals that require special-casing

What to do when my ideas aren't chosen, when I strongly disagree with the chosen solution?

I2C signal and power over long range (10meter cable)

Why is delta-v is the most useful quantity for planning space travel?

Can a malicious addon access internet history and such in chrome/firefox?

What would you call a finite collection of unordered objects that are not necessarily distinct?

Is infinity mathematically observable?

Is it possible to build a CPA Secure encryption scheme which remains secure even when the encryption of secret key is given?

What do you call the infoboxes with text and sometimes images on the side of a page we find in textbooks?

How to be able to process a large JSON response?



What's the POSIX way to read an exact number of bytes from a file?



2019 Community Moderator ElectionCreate random data with dd and get “partial read warning”. Is the data after the warning now really random?Can I read a single character from stdin in POSIX shell?Can't clone hard disk with dd no space leftBest way to remove bytes from the start of a file?Shell: How to read the bytes of a binary file and print as hexadecimal?Read the middle of a large fileextract number of bytes out of a filewhy does it take so long to read the top few lines of my file?Does POSIX limit the number of directories in the os root?What is the portable (POSIX) way to achieve process substitution?Remove null bytes from the end of a large fileDoes POSIX standardize the file descriptor numbers?What's the right way to base64 encode a binary file on CentOS 7?










2















Just hit this problem, and learned a lot from the chosen answer: Create random data with dd and get "partial read warning". Is the data after the warning now really random?



Unfortunately the suggested solution head -c is not portable.



For folks who insist that dd is the answer, please carefully read the linked answer which explains in great detail why dd can not be the answer. Also, please observe this:



$ dd bs=1000000 count=10 if=/dev/random of=random
dd: warning: partial read (89 bytes); suggest iflag=fullblock
0+10 records in
0+10 records out
143 bytes (143 B) copied, 99.3918 s, 0.0 kB/s
$ ls -l random ; du -kP random
-rw-rw-r-- 1 me me 143 Apr 22 19:19 random
4 random
$ pwd
/tmp









share|improve this question
























  • dd is portable. If you don't mind the warning, or adjust your blocksize, is there a problem with using dd?

    – muru
    Apr 22 '16 at 21:35











  • @muru OP referred to head -c not being portable.

    – Guido
    Apr 22 '16 at 22:30











  • @Guido yes, but dd is.

    – muru
    Apr 22 '16 at 22:30











  • @muru dd doesn't do the job, for reasons explained in the linked answer. In my experiments, requesting 10 * 2^20 bytes with dd yields less than 200 bytes. If you don't understand or believe that, I urge you to read the linked answer which clearly explains how it can be so.

    – Low Powah
    Apr 22 '16 at 22:55











  • @LowPowah I did read the linked post and I understand it, but I wonder why you can't adjust your blocksize.

    – muru
    Apr 22 '16 at 23:01















2















Just hit this problem, and learned a lot from the chosen answer: Create random data with dd and get "partial read warning". Is the data after the warning now really random?



Unfortunately the suggested solution head -c is not portable.



For folks who insist that dd is the answer, please carefully read the linked answer which explains in great detail why dd can not be the answer. Also, please observe this:



$ dd bs=1000000 count=10 if=/dev/random of=random
dd: warning: partial read (89 bytes); suggest iflag=fullblock
0+10 records in
0+10 records out
143 bytes (143 B) copied, 99.3918 s, 0.0 kB/s
$ ls -l random ; du -kP random
-rw-rw-r-- 1 me me 143 Apr 22 19:19 random
4 random
$ pwd
/tmp









share|improve this question
























  • dd is portable. If you don't mind the warning, or adjust your blocksize, is there a problem with using dd?

    – muru
    Apr 22 '16 at 21:35











  • @muru OP referred to head -c not being portable.

    – Guido
    Apr 22 '16 at 22:30











  • @Guido yes, but dd is.

    – muru
    Apr 22 '16 at 22:30











  • @muru dd doesn't do the job, for reasons explained in the linked answer. In my experiments, requesting 10 * 2^20 bytes with dd yields less than 200 bytes. If you don't understand or believe that, I urge you to read the linked answer which clearly explains how it can be so.

    – Low Powah
    Apr 22 '16 at 22:55











  • @LowPowah I did read the linked post and I understand it, but I wonder why you can't adjust your blocksize.

    – muru
    Apr 22 '16 at 23:01













2












2








2








Just hit this problem, and learned a lot from the chosen answer: Create random data with dd and get "partial read warning". Is the data after the warning now really random?



Unfortunately the suggested solution head -c is not portable.



For folks who insist that dd is the answer, please carefully read the linked answer which explains in great detail why dd can not be the answer. Also, please observe this:



$ dd bs=1000000 count=10 if=/dev/random of=random
dd: warning: partial read (89 bytes); suggest iflag=fullblock
0+10 records in
0+10 records out
143 bytes (143 B) copied, 99.3918 s, 0.0 kB/s
$ ls -l random ; du -kP random
-rw-rw-r-- 1 me me 143 Apr 22 19:19 random
4 random
$ pwd
/tmp









share|improve this question
















Just hit this problem, and learned a lot from the chosen answer: Create random data with dd and get "partial read warning". Is the data after the warning now really random?



Unfortunately the suggested solution head -c is not portable.



For folks who insist that dd is the answer, please carefully read the linked answer which explains in great detail why dd can not be the answer. Also, please observe this:



$ dd bs=1000000 count=10 if=/dev/random of=random
dd: warning: partial read (89 bytes); suggest iflag=fullblock
0+10 records in
0+10 records out
143 bytes (143 B) copied, 99.3918 s, 0.0 kB/s
$ ls -l random ; du -kP random
-rw-rw-r-- 1 me me 143 Apr 22 19:19 random
4 random
$ pwd
/tmp






dd posix binary head






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 22 '16 at 23:28







Low Powah

















asked Apr 22 '16 at 21:23









Low PowahLow Powah

137




137












  • dd is portable. If you don't mind the warning, or adjust your blocksize, is there a problem with using dd?

    – muru
    Apr 22 '16 at 21:35











  • @muru OP referred to head -c not being portable.

    – Guido
    Apr 22 '16 at 22:30











  • @Guido yes, but dd is.

    – muru
    Apr 22 '16 at 22:30











  • @muru dd doesn't do the job, for reasons explained in the linked answer. In my experiments, requesting 10 * 2^20 bytes with dd yields less than 200 bytes. If you don't understand or believe that, I urge you to read the linked answer which clearly explains how it can be so.

    – Low Powah
    Apr 22 '16 at 22:55











  • @LowPowah I did read the linked post and I understand it, but I wonder why you can't adjust your blocksize.

    – muru
    Apr 22 '16 at 23:01

















  • dd is portable. If you don't mind the warning, or adjust your blocksize, is there a problem with using dd?

    – muru
    Apr 22 '16 at 21:35











  • @muru OP referred to head -c not being portable.

    – Guido
    Apr 22 '16 at 22:30











  • @Guido yes, but dd is.

    – muru
    Apr 22 '16 at 22:30











  • @muru dd doesn't do the job, for reasons explained in the linked answer. In my experiments, requesting 10 * 2^20 bytes with dd yields less than 200 bytes. If you don't understand or believe that, I urge you to read the linked answer which clearly explains how it can be so.

    – Low Powah
    Apr 22 '16 at 22:55











  • @LowPowah I did read the linked post and I understand it, but I wonder why you can't adjust your blocksize.

    – muru
    Apr 22 '16 at 23:01
















dd is portable. If you don't mind the warning, or adjust your blocksize, is there a problem with using dd?

– muru
Apr 22 '16 at 21:35





dd is portable. If you don't mind the warning, or adjust your blocksize, is there a problem with using dd?

– muru
Apr 22 '16 at 21:35













@muru OP referred to head -c not being portable.

– Guido
Apr 22 '16 at 22:30





@muru OP referred to head -c not being portable.

– Guido
Apr 22 '16 at 22:30













@Guido yes, but dd is.

– muru
Apr 22 '16 at 22:30





@Guido yes, but dd is.

– muru
Apr 22 '16 at 22:30













@muru dd doesn't do the job, for reasons explained in the linked answer. In my experiments, requesting 10 * 2^20 bytes with dd yields less than 200 bytes. If you don't understand or believe that, I urge you to read the linked answer which clearly explains how it can be so.

– Low Powah
Apr 22 '16 at 22:55





@muru dd doesn't do the job, for reasons explained in the linked answer. In my experiments, requesting 10 * 2^20 bytes with dd yields less than 200 bytes. If you don't understand or believe that, I urge you to read the linked answer which clearly explains how it can be so.

– Low Powah
Apr 22 '16 at 22:55













@LowPowah I did read the linked post and I understand it, but I wonder why you can't adjust your blocksize.

– muru
Apr 22 '16 at 23:01





@LowPowah I did read the linked post and I understand it, but I wonder why you can't adjust your blocksize.

– muru
Apr 22 '16 at 23:01










3 Answers
3






active

oldest

votes


















6














Unfortunately, to manipulate the content of a binary file, dd is pretty much the only tool in POSIX. Although most modern implementations of text processing tools (cat, sed, awk, …) can manipulate binary files, this is not required by POSIX: some older implementations do choke on null bytes, input not terminated by a newline, or invalid byte sequences in the ambient character encoding.



It is possible, but difficult, to use dd safely. The reason I spend a lot of energy steering people away from it is that there's a lot of advice out there that promotes dd in situations where it is neither useful nor safe.



The problem with dd is its notion of blocks: it assumes that a call to read returns one block; if read returns less data, you get a partial block, which throws things like skip and count off. Here's an example that illustrates the problem, where dd is reading from a pipe that delivers data relatively slowly:



yes hello | while read line; do echo $line; done | dd ibs=4 count=1000 | wc -c


On a bog-standard Linux (Debian jessie, Linux kernel 3.16, dd from GNU coreutils 8.23), I get a highly variable number of bytes, ranging from about 3000 to almost 4000. Change the input block size to a divisor of 6, and the output is consistently 4000 bytes as one would naively expect — the input to dd arrives in bursts of 6 bytes, and as long as a block doesn't span multiple bursts, dd gets to read a complete block.



This suggests a solution: use an input block size of 1. No matter how the input is produced, there's no way for dd to read a partial block if the input block size is 1. (This is not completely obvious: dd could read a block of size 0 if it's interrupted by a signal — but if it's interrupted by a signal, the read system call returns -1. A read returning 0 is only possible if the file is opened in non-blocking mode, and in that case a read had better not be considered to have been performed at all. In blocking mode, read only returns 0 at the end of the file.)



dd ibs=1 count="$number_of_bytes"


The problem with this approach is that it can be slow (but not shockingly slow: only about 4 times slower than head -c in my quick benchmark).



POSIX defines other tools that read binary data and convert it to a text format: uuencode (outputs in historical uuencode format or in Base64), od (outputs an octal or hexadecimal dump). Neither is well-suited to the task at hand. uuencode can be undone by uudecode, but counting bytes in the output is awkward because the number of bytes per line of output is not standardized. It's possible to get well-defined output from od, but unfortunately there's no POSIX tool to go the other way round (it can be done but only through slow loops in sh or awk, which defeats the purpose here).






share|improve this answer























  • Thank you for a very comprehensive answer. It seems like there is no simple, and safe way which is also portable. Maybe the answer is to write a C program if one wants to work with arbitrary bytes in units smaller than lines. I am intrigued by the possibility of a uuencode/uudecode solution. Can you please explain a little more why such a solution would not be safe or portable? (I'm defining safe to mean guaranteed not to lose data on given that everything else works perfectly.)

    – Low Powah
    Apr 23 '16 at 0:27











  • @LowPowah uuencode won't lose data, the problem is counting the input bytes. You can easily count the number of lines, but the number of bytes per line is not standardized. You can pipe into awk and do the counting there, but if you do that I think you'll lose any speed advantage. Furthermore the output of uuencode (in either format) can't easily be split according to input bytes, since it processes bytes by blocks. The output of od is easy to work with but difficult to convert back to binary afterwards.

    – Gilles
    Apr 23 '16 at 0:34


















0














Newer version of dd have a count_bytes iflag. eg:



cat /dev/zero | dd count=1234 iflag=count_bytes | wc -c


will output something like



2+1 records in
2+1 records out
1234 bytes (1.2 kB, 1.2 KiB) copied, 0.000161684 s, 7.6 MB/s
1234





share|improve this answer








New contributor




jdizzle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



























    -1














    Part of the point of using dd at all is that the user gets to pick the block size it uses. If dd fails for too large block sizes, IMO it's the user's responsibility to try smaller block sizes. I could ask for a TB from dd in one block, but that doesn't mean I'll get it.



    If you want an exact number of bytes, this will be horrendously slow, but should work:



    dd bs=1 count=1000000


    If even a block size of 1 results in partial reads, …






    share|improve this answer






















      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f278443%2fwhats-the-posix-way-to-read-an-exact-number-of-bytes-from-a-file%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      6














      Unfortunately, to manipulate the content of a binary file, dd is pretty much the only tool in POSIX. Although most modern implementations of text processing tools (cat, sed, awk, …) can manipulate binary files, this is not required by POSIX: some older implementations do choke on null bytes, input not terminated by a newline, or invalid byte sequences in the ambient character encoding.



      It is possible, but difficult, to use dd safely. The reason I spend a lot of energy steering people away from it is that there's a lot of advice out there that promotes dd in situations where it is neither useful nor safe.



      The problem with dd is its notion of blocks: it assumes that a call to read returns one block; if read returns less data, you get a partial block, which throws things like skip and count off. Here's an example that illustrates the problem, where dd is reading from a pipe that delivers data relatively slowly:



      yes hello | while read line; do echo $line; done | dd ibs=4 count=1000 | wc -c


      On a bog-standard Linux (Debian jessie, Linux kernel 3.16, dd from GNU coreutils 8.23), I get a highly variable number of bytes, ranging from about 3000 to almost 4000. Change the input block size to a divisor of 6, and the output is consistently 4000 bytes as one would naively expect — the input to dd arrives in bursts of 6 bytes, and as long as a block doesn't span multiple bursts, dd gets to read a complete block.



      This suggests a solution: use an input block size of 1. No matter how the input is produced, there's no way for dd to read a partial block if the input block size is 1. (This is not completely obvious: dd could read a block of size 0 if it's interrupted by a signal — but if it's interrupted by a signal, the read system call returns -1. A read returning 0 is only possible if the file is opened in non-blocking mode, and in that case a read had better not be considered to have been performed at all. In blocking mode, read only returns 0 at the end of the file.)



      dd ibs=1 count="$number_of_bytes"


      The problem with this approach is that it can be slow (but not shockingly slow: only about 4 times slower than head -c in my quick benchmark).



      POSIX defines other tools that read binary data and convert it to a text format: uuencode (outputs in historical uuencode format or in Base64), od (outputs an octal or hexadecimal dump). Neither is well-suited to the task at hand. uuencode can be undone by uudecode, but counting bytes in the output is awkward because the number of bytes per line of output is not standardized. It's possible to get well-defined output from od, but unfortunately there's no POSIX tool to go the other way round (it can be done but only through slow loops in sh or awk, which defeats the purpose here).






      share|improve this answer























      • Thank you for a very comprehensive answer. It seems like there is no simple, and safe way which is also portable. Maybe the answer is to write a C program if one wants to work with arbitrary bytes in units smaller than lines. I am intrigued by the possibility of a uuencode/uudecode solution. Can you please explain a little more why such a solution would not be safe or portable? (I'm defining safe to mean guaranteed not to lose data on given that everything else works perfectly.)

        – Low Powah
        Apr 23 '16 at 0:27











      • @LowPowah uuencode won't lose data, the problem is counting the input bytes. You can easily count the number of lines, but the number of bytes per line is not standardized. You can pipe into awk and do the counting there, but if you do that I think you'll lose any speed advantage. Furthermore the output of uuencode (in either format) can't easily be split according to input bytes, since it processes bytes by blocks. The output of od is easy to work with but difficult to convert back to binary afterwards.

        – Gilles
        Apr 23 '16 at 0:34















      6














      Unfortunately, to manipulate the content of a binary file, dd is pretty much the only tool in POSIX. Although most modern implementations of text processing tools (cat, sed, awk, …) can manipulate binary files, this is not required by POSIX: some older implementations do choke on null bytes, input not terminated by a newline, or invalid byte sequences in the ambient character encoding.



      It is possible, but difficult, to use dd safely. The reason I spend a lot of energy steering people away from it is that there's a lot of advice out there that promotes dd in situations where it is neither useful nor safe.



      The problem with dd is its notion of blocks: it assumes that a call to read returns one block; if read returns less data, you get a partial block, which throws things like skip and count off. Here's an example that illustrates the problem, where dd is reading from a pipe that delivers data relatively slowly:



      yes hello | while read line; do echo $line; done | dd ibs=4 count=1000 | wc -c


      On a bog-standard Linux (Debian jessie, Linux kernel 3.16, dd from GNU coreutils 8.23), I get a highly variable number of bytes, ranging from about 3000 to almost 4000. Change the input block size to a divisor of 6, and the output is consistently 4000 bytes as one would naively expect — the input to dd arrives in bursts of 6 bytes, and as long as a block doesn't span multiple bursts, dd gets to read a complete block.



      This suggests a solution: use an input block size of 1. No matter how the input is produced, there's no way for dd to read a partial block if the input block size is 1. (This is not completely obvious: dd could read a block of size 0 if it's interrupted by a signal — but if it's interrupted by a signal, the read system call returns -1. A read returning 0 is only possible if the file is opened in non-blocking mode, and in that case a read had better not be considered to have been performed at all. In blocking mode, read only returns 0 at the end of the file.)



      dd ibs=1 count="$number_of_bytes"


      The problem with this approach is that it can be slow (but not shockingly slow: only about 4 times slower than head -c in my quick benchmark).



      POSIX defines other tools that read binary data and convert it to a text format: uuencode (outputs in historical uuencode format or in Base64), od (outputs an octal or hexadecimal dump). Neither is well-suited to the task at hand. uuencode can be undone by uudecode, but counting bytes in the output is awkward because the number of bytes per line of output is not standardized. It's possible to get well-defined output from od, but unfortunately there's no POSIX tool to go the other way round (it can be done but only through slow loops in sh or awk, which defeats the purpose here).






      share|improve this answer























      • Thank you for a very comprehensive answer. It seems like there is no simple, and safe way which is also portable. Maybe the answer is to write a C program if one wants to work with arbitrary bytes in units smaller than lines. I am intrigued by the possibility of a uuencode/uudecode solution. Can you please explain a little more why such a solution would not be safe or portable? (I'm defining safe to mean guaranteed not to lose data on given that everything else works perfectly.)

        – Low Powah
        Apr 23 '16 at 0:27











      • @LowPowah uuencode won't lose data, the problem is counting the input bytes. You can easily count the number of lines, but the number of bytes per line is not standardized. You can pipe into awk and do the counting there, but if you do that I think you'll lose any speed advantage. Furthermore the output of uuencode (in either format) can't easily be split according to input bytes, since it processes bytes by blocks. The output of od is easy to work with but difficult to convert back to binary afterwards.

        – Gilles
        Apr 23 '16 at 0:34













      6












      6








      6







      Unfortunately, to manipulate the content of a binary file, dd is pretty much the only tool in POSIX. Although most modern implementations of text processing tools (cat, sed, awk, …) can manipulate binary files, this is not required by POSIX: some older implementations do choke on null bytes, input not terminated by a newline, or invalid byte sequences in the ambient character encoding.



      It is possible, but difficult, to use dd safely. The reason I spend a lot of energy steering people away from it is that there's a lot of advice out there that promotes dd in situations where it is neither useful nor safe.



      The problem with dd is its notion of blocks: it assumes that a call to read returns one block; if read returns less data, you get a partial block, which throws things like skip and count off. Here's an example that illustrates the problem, where dd is reading from a pipe that delivers data relatively slowly:



      yes hello | while read line; do echo $line; done | dd ibs=4 count=1000 | wc -c


      On a bog-standard Linux (Debian jessie, Linux kernel 3.16, dd from GNU coreutils 8.23), I get a highly variable number of bytes, ranging from about 3000 to almost 4000. Change the input block size to a divisor of 6, and the output is consistently 4000 bytes as one would naively expect — the input to dd arrives in bursts of 6 bytes, and as long as a block doesn't span multiple bursts, dd gets to read a complete block.



      This suggests a solution: use an input block size of 1. No matter how the input is produced, there's no way for dd to read a partial block if the input block size is 1. (This is not completely obvious: dd could read a block of size 0 if it's interrupted by a signal — but if it's interrupted by a signal, the read system call returns -1. A read returning 0 is only possible if the file is opened in non-blocking mode, and in that case a read had better not be considered to have been performed at all. In blocking mode, read only returns 0 at the end of the file.)



      dd ibs=1 count="$number_of_bytes"


      The problem with this approach is that it can be slow (but not shockingly slow: only about 4 times slower than head -c in my quick benchmark).



      POSIX defines other tools that read binary data and convert it to a text format: uuencode (outputs in historical uuencode format or in Base64), od (outputs an octal or hexadecimal dump). Neither is well-suited to the task at hand. uuencode can be undone by uudecode, but counting bytes in the output is awkward because the number of bytes per line of output is not standardized. It's possible to get well-defined output from od, but unfortunately there's no POSIX tool to go the other way round (it can be done but only through slow loops in sh or awk, which defeats the purpose here).






      share|improve this answer













      Unfortunately, to manipulate the content of a binary file, dd is pretty much the only tool in POSIX. Although most modern implementations of text processing tools (cat, sed, awk, …) can manipulate binary files, this is not required by POSIX: some older implementations do choke on null bytes, input not terminated by a newline, or invalid byte sequences in the ambient character encoding.



      It is possible, but difficult, to use dd safely. The reason I spend a lot of energy steering people away from it is that there's a lot of advice out there that promotes dd in situations where it is neither useful nor safe.



      The problem with dd is its notion of blocks: it assumes that a call to read returns one block; if read returns less data, you get a partial block, which throws things like skip and count off. Here's an example that illustrates the problem, where dd is reading from a pipe that delivers data relatively slowly:



      yes hello | while read line; do echo $line; done | dd ibs=4 count=1000 | wc -c


      On a bog-standard Linux (Debian jessie, Linux kernel 3.16, dd from GNU coreutils 8.23), I get a highly variable number of bytes, ranging from about 3000 to almost 4000. Change the input block size to a divisor of 6, and the output is consistently 4000 bytes as one would naively expect — the input to dd arrives in bursts of 6 bytes, and as long as a block doesn't span multiple bursts, dd gets to read a complete block.



      This suggests a solution: use an input block size of 1. No matter how the input is produced, there's no way for dd to read a partial block if the input block size is 1. (This is not completely obvious: dd could read a block of size 0 if it's interrupted by a signal — but if it's interrupted by a signal, the read system call returns -1. A read returning 0 is only possible if the file is opened in non-blocking mode, and in that case a read had better not be considered to have been performed at all. In blocking mode, read only returns 0 at the end of the file.)



      dd ibs=1 count="$number_of_bytes"


      The problem with this approach is that it can be slow (but not shockingly slow: only about 4 times slower than head -c in my quick benchmark).



      POSIX defines other tools that read binary data and convert it to a text format: uuencode (outputs in historical uuencode format or in Base64), od (outputs an octal or hexadecimal dump). Neither is well-suited to the task at hand. uuencode can be undone by uudecode, but counting bytes in the output is awkward because the number of bytes per line of output is not standardized. It's possible to get well-defined output from od, but unfortunately there's no POSIX tool to go the other way round (it can be done but only through slow loops in sh or awk, which defeats the purpose here).







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Apr 22 '16 at 23:36









      GillesGilles

      544k12811021619




      544k12811021619












      • Thank you for a very comprehensive answer. It seems like there is no simple, and safe way which is also portable. Maybe the answer is to write a C program if one wants to work with arbitrary bytes in units smaller than lines. I am intrigued by the possibility of a uuencode/uudecode solution. Can you please explain a little more why such a solution would not be safe or portable? (I'm defining safe to mean guaranteed not to lose data on given that everything else works perfectly.)

        – Low Powah
        Apr 23 '16 at 0:27











      • @LowPowah uuencode won't lose data, the problem is counting the input bytes. You can easily count the number of lines, but the number of bytes per line is not standardized. You can pipe into awk and do the counting there, but if you do that I think you'll lose any speed advantage. Furthermore the output of uuencode (in either format) can't easily be split according to input bytes, since it processes bytes by blocks. The output of od is easy to work with but difficult to convert back to binary afterwards.

        – Gilles
        Apr 23 '16 at 0:34

















      • Thank you for a very comprehensive answer. It seems like there is no simple, and safe way which is also portable. Maybe the answer is to write a C program if one wants to work with arbitrary bytes in units smaller than lines. I am intrigued by the possibility of a uuencode/uudecode solution. Can you please explain a little more why such a solution would not be safe or portable? (I'm defining safe to mean guaranteed not to lose data on given that everything else works perfectly.)

        – Low Powah
        Apr 23 '16 at 0:27











      • @LowPowah uuencode won't lose data, the problem is counting the input bytes. You can easily count the number of lines, but the number of bytes per line is not standardized. You can pipe into awk and do the counting there, but if you do that I think you'll lose any speed advantage. Furthermore the output of uuencode (in either format) can't easily be split according to input bytes, since it processes bytes by blocks. The output of od is easy to work with but difficult to convert back to binary afterwards.

        – Gilles
        Apr 23 '16 at 0:34
















      Thank you for a very comprehensive answer. It seems like there is no simple, and safe way which is also portable. Maybe the answer is to write a C program if one wants to work with arbitrary bytes in units smaller than lines. I am intrigued by the possibility of a uuencode/uudecode solution. Can you please explain a little more why such a solution would not be safe or portable? (I'm defining safe to mean guaranteed not to lose data on given that everything else works perfectly.)

      – Low Powah
      Apr 23 '16 at 0:27





      Thank you for a very comprehensive answer. It seems like there is no simple, and safe way which is also portable. Maybe the answer is to write a C program if one wants to work with arbitrary bytes in units smaller than lines. I am intrigued by the possibility of a uuencode/uudecode solution. Can you please explain a little more why such a solution would not be safe or portable? (I'm defining safe to mean guaranteed not to lose data on given that everything else works perfectly.)

      – Low Powah
      Apr 23 '16 at 0:27













      @LowPowah uuencode won't lose data, the problem is counting the input bytes. You can easily count the number of lines, but the number of bytes per line is not standardized. You can pipe into awk and do the counting there, but if you do that I think you'll lose any speed advantage. Furthermore the output of uuencode (in either format) can't easily be split according to input bytes, since it processes bytes by blocks. The output of od is easy to work with but difficult to convert back to binary afterwards.

      – Gilles
      Apr 23 '16 at 0:34





      @LowPowah uuencode won't lose data, the problem is counting the input bytes. You can easily count the number of lines, but the number of bytes per line is not standardized. You can pipe into awk and do the counting there, but if you do that I think you'll lose any speed advantage. Furthermore the output of uuencode (in either format) can't easily be split according to input bytes, since it processes bytes by blocks. The output of od is easy to work with but difficult to convert back to binary afterwards.

      – Gilles
      Apr 23 '16 at 0:34













      0














      Newer version of dd have a count_bytes iflag. eg:



      cat /dev/zero | dd count=1234 iflag=count_bytes | wc -c


      will output something like



      2+1 records in
      2+1 records out
      1234 bytes (1.2 kB, 1.2 KiB) copied, 0.000161684 s, 7.6 MB/s
      1234





      share|improve this answer








      New contributor




      jdizzle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.
























        0














        Newer version of dd have a count_bytes iflag. eg:



        cat /dev/zero | dd count=1234 iflag=count_bytes | wc -c


        will output something like



        2+1 records in
        2+1 records out
        1234 bytes (1.2 kB, 1.2 KiB) copied, 0.000161684 s, 7.6 MB/s
        1234





        share|improve this answer








        New contributor




        jdizzle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






















          0












          0








          0







          Newer version of dd have a count_bytes iflag. eg:



          cat /dev/zero | dd count=1234 iflag=count_bytes | wc -c


          will output something like



          2+1 records in
          2+1 records out
          1234 bytes (1.2 kB, 1.2 KiB) copied, 0.000161684 s, 7.6 MB/s
          1234





          share|improve this answer








          New contributor




          jdizzle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.










          Newer version of dd have a count_bytes iflag. eg:



          cat /dev/zero | dd count=1234 iflag=count_bytes | wc -c


          will output something like



          2+1 records in
          2+1 records out
          1234 bytes (1.2 kB, 1.2 KiB) copied, 0.000161684 s, 7.6 MB/s
          1234






          share|improve this answer








          New contributor




          jdizzle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer






          New contributor




          jdizzle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered Mar 22 at 18:43









          jdizzlejdizzle

          1011




          1011




          New contributor




          jdizzle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          jdizzle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          jdizzle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





















              -1














              Part of the point of using dd at all is that the user gets to pick the block size it uses. If dd fails for too large block sizes, IMO it's the user's responsibility to try smaller block sizes. I could ask for a TB from dd in one block, but that doesn't mean I'll get it.



              If you want an exact number of bytes, this will be horrendously slow, but should work:



              dd bs=1 count=1000000


              If even a block size of 1 results in partial reads, …






              share|improve this answer



























                -1














                Part of the point of using dd at all is that the user gets to pick the block size it uses. If dd fails for too large block sizes, IMO it's the user's responsibility to try smaller block sizes. I could ask for a TB from dd in one block, but that doesn't mean I'll get it.



                If you want an exact number of bytes, this will be horrendously slow, but should work:



                dd bs=1 count=1000000


                If even a block size of 1 results in partial reads, …






                share|improve this answer

























                  -1












                  -1








                  -1







                  Part of the point of using dd at all is that the user gets to pick the block size it uses. If dd fails for too large block sizes, IMO it's the user's responsibility to try smaller block sizes. I could ask for a TB from dd in one block, but that doesn't mean I'll get it.



                  If you want an exact number of bytes, this will be horrendously slow, but should work:



                  dd bs=1 count=1000000


                  If even a block size of 1 results in partial reads, …






                  share|improve this answer













                  Part of the point of using dd at all is that the user gets to pick the block size it uses. If dd fails for too large block sizes, IMO it's the user's responsibility to try smaller block sizes. I could ask for a TB from dd in one block, but that doesn't mean I'll get it.



                  If you want an exact number of bytes, this will be horrendously slow, but should work:



                  dd bs=1 count=1000000


                  If even a block size of 1 results in partial reads, …







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Apr 22 '16 at 23:29









                  murumuru

                  36.5k589163




                  36.5k589163



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Unix & Linux Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f278443%2fwhats-the-posix-way-to-read-an-exact-number-of-bytes-from-a-file%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      -binary, dd, head, posix

                      Popular posts from this blog

                      Creating 100m^2 grid automatically using QGIS?Creating grid constrained within polygon in QGIS?Createing polygon layer from point data using QGIS?Creating vector grid using QGIS?Creating grid polygons from coordinates using R or PythonCreating grid from spatio temporal point data?Creating fields in attributes table using other layers using QGISCreate .shp vector grid in QGISQGIS Creating 4km point grid within polygonsCreate a vector grid over a raster layerVector Grid Creates just one grid

                      What is this called? Old film camera viewer?What makes a good film camera?What to do with an old film camera?What should one look for when buying a used film camera?What is the value and age of this pre-1967 Ricoh 35 mm camera?DSLR recommendation, question about old Canon 35mm film Camera & lensesCan anyone identify the silver rangefinder-style camera in this advertisement?What kind of a Polaroid 600-camera is this?Will an old film camera still work even when not used in a very long time?What is this camera / Can I develop the film?How to fit an action camera into antique (bellows) housing?What to check when buying used and old film bodies?

                      Why is this plane circling around the Lucknow airport every day?Why do aircraft on Flight Radar 24 jump around randomly sometimes?What airport has this walkway over a taxiway?How does Chicago O'Hare's tower sequence aircraft at peak capacity?Which airport is featured in this Delta commercial?After a crash, for how long is the airport closed?Can a passenger plane stand still in the air, or hover at a fixed location above a ground?What are those trucks towing around, and why?What is this airport outside of Cairo, Egypt?Which US airport has the lowest circling MDH?What is this airport video?