sed anchor characters in [^]Why the inconsistency with using cat vs. echo piped to this sed command?Simple sed replacement of tabs mysteriously failingregarding portable sed -e… d b or ! b?Replace special characters with sedSED find and replace element in filename with incremental valuessed and special charactersSED challenge,aggregating String containing bracesInsert Newlines into serial stream before writing to text filesed: couldn't write n items to stdout: Broken pipe. What are these errors?/(.+)n1/ works but /(.*)n1/ doesn't when they should both work
Is there a name of the flying bionic bird?
How would photo IDs work for shapeshifters?
A poker game description that does not feel gimmicky
Pristine Bit Checking
Does a dangling wire really electrocute me if I'm standing in water?
How to make particles emit from certain parts of a 3D object?
How to answer pointed "are you quitting" questioning when I don't want them to suspect
Why doesn't a const reference extend the life of a temporary object passed via a function?
I’m planning on buying a laser printer but concerned about the life cycle of toner in the machine
Add an angle to a sphere
Why is my log file so massive? 22gb. I am running log backups
Does the average primeness of natural numbers tend to zero?
Copycat chess is back
What do the Banks children have against barley water?
LWC and complex parameters
Is it legal to have the "// (c) 2019 John Smith" header in all files when there are hundreds of contributors?
Doomsday-clock for my fantasy planet
"listening to me about as much as you're listening to this pole here"
Denied boarding due to overcrowding, Sparpreis ticket. What are my rights?
Is domain driven design an anti-SQL pattern?
What does it exactly mean if a random variable follows a distribution
extract characters between two commas?
How to deal with fear of taking dependencies
How could a lack of term limits lead to a "dictatorship?"
sed anchor characters in [^]
Why the inconsistency with using cat vs. echo piped to this sed command?Simple sed replacement of tabs mysteriously failingregarding portable sed -e… d b or ! b?Replace special characters with sedSED find and replace element in filename with incremental valuessed and special charactersSED challenge,aggregating String containing bracesInsert Newlines into serial stream before writing to text filesed: couldn't write n items to stdout: Broken pipe. What are these errors?/(.+)n1/ works but /(.*)n1/ doesn't when they should both work
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
Why does sed, if we use the negation expression[^ ] treat anchor characters like b or B as real characters? E. g. one would expect the following expressions to yield the same result, but they don't:
$ echo 'apple pear melon banana cherry papaya' | sed 's/[^b]a[^b]/u/g'
apple pu melon baua cherry uaya
$ echo 'apple pear melon banana cherry papaya' | sed 's/BaB/u/g'
apple peur melon bununa cherry pupuya
If there was no B, how could we negate b?
sed
add a comment |
Why does sed, if we use the negation expression[^ ] treat anchor characters like b or B as real characters? E. g. one would expect the following expressions to yield the same result, but they don't:
$ echo 'apple pear melon banana cherry papaya' | sed 's/[^b]a[^b]/u/g'
apple pu melon baua cherry uaya
$ echo 'apple pear melon banana cherry papaya' | sed 's/BaB/u/g'
apple peur melon bununa cherry pupuya
If there was no B, how could we negate b?
sed
add a comment |
Why does sed, if we use the negation expression[^ ] treat anchor characters like b or B as real characters? E. g. one would expect the following expressions to yield the same result, but they don't:
$ echo 'apple pear melon banana cherry papaya' | sed 's/[^b]a[^b]/u/g'
apple pu melon baua cherry uaya
$ echo 'apple pear melon banana cherry papaya' | sed 's/BaB/u/g'
apple peur melon bununa cherry pupuya
If there was no B, how could we negate b?
sed
Why does sed, if we use the negation expression[^ ] treat anchor characters like b or B as real characters? E. g. one would expect the following expressions to yield the same result, but they don't:
$ echo 'apple pear melon banana cherry papaya' | sed 's/[^b]a[^b]/u/g'
apple pu melon baua cherry uaya
$ echo 'apple pear melon banana cherry papaya' | sed 's/BaB/u/g'
apple peur melon bununa cherry pupuya
If there was no B, how could we negate b?
sed
sed
asked Mar 28 at 12:08
AmaterasuAmaterasu
233
233
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Neither of b or B is a character. Both are zero-width patterns that matches between characters.
The b pattern matches at a word boundary, i.e. between a character that is a "word character" and a character that is not a "word character".
The B pattern matches at a non-word boundary, i.e. between a two characters that are both either "word characters" or not.
The pattern [^b] matches one character. This is why pear is transformed into pu, you replace ear (the a and the surrounding characters).
With GNU sed, [^b] matches a character that is not an or a b.
There is no way to use a character class to replace the use of B that I'm aware of.
The b and B patterns are supported by GNU sed. Both GNU sed and BSD sed also has < and > for explicitly matching at the start and end of a word, and BSD sed additionally supports the POSIX patterns [[:<:]] and [[:>:]] (but GNU sed does not). The POSIX patterns can't be negated ([^[:>:]] does not work).
To get a similar effect without using B, you could use something like
$ echo 'apple pear melon banana cherry papaya' | sed 's/([[:alnum:]])a([[:alnum:]])/1u2/g'
apple peur melon bunana cherry pupaya
That is, match an alphanumeric character on either side of the a, and then include these two flanking characters in the replacement. Note that since the replacement only happens for non-overlapping matches, this would not properly substitute the a's in a string containing multiple consecutive a's (or a's in every second position). See how banana that does not come out as bununa due to this.
To sort that out, you could introduce a loop in the sed program:
sed -e :top -e 's/([[:alnum:]])a([[:alnum:]])/1u2/g' -e ttop
This performs the replacement over the input line as many times as needed until all overlapping pattern matches have been handled.
1
Note that POSIX doesn't specify[[:<:]]. While it's shaped like a POSIX character class, it's not a character class at all.
– Stéphane Chazelas
Mar 28 at 13:01
For ast-open'ssed,bmeans a backspace character (like in many other utilities includingecho,printfandawkand$'...').
– Stéphane Chazelas
Mar 28 at 13:02
The section about The additional word delimiters<and>are provided to ease compatibility with traditional SVR4 systems but are not portable and should be avoided in the re_format man page of OpenBSD makes little sense.<is more portable than[[:<:]]and comes fromvi(so from BSD) long before SVR4. AFAIK,bcomes fromperl
– Stéphane Chazelas
Mar 28 at 13:18
@StéphaneChazelas Thanks. I might submit a bug report/patch to the OpenBSD lists when I find the time. It's been on my mind.
– Kusalananda♦
Mar 28 at 13:20
[^b]matches a collating element , so could match more than one character.printf '%sn' 'abdz' | LC_ALL=hu_HU.UTF-8 sed 's/[^b]/<&>/g'outputs<a>b<dz>with GNUsedfor instance.
– Stéphane Chazelas
Mar 28 at 13:37
|
show 1 more comment
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f509199%2fsed-anchor-characters-in%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Neither of b or B is a character. Both are zero-width patterns that matches between characters.
The b pattern matches at a word boundary, i.e. between a character that is a "word character" and a character that is not a "word character".
The B pattern matches at a non-word boundary, i.e. between a two characters that are both either "word characters" or not.
The pattern [^b] matches one character. This is why pear is transformed into pu, you replace ear (the a and the surrounding characters).
With GNU sed, [^b] matches a character that is not an or a b.
There is no way to use a character class to replace the use of B that I'm aware of.
The b and B patterns are supported by GNU sed. Both GNU sed and BSD sed also has < and > for explicitly matching at the start and end of a word, and BSD sed additionally supports the POSIX patterns [[:<:]] and [[:>:]] (but GNU sed does not). The POSIX patterns can't be negated ([^[:>:]] does not work).
To get a similar effect without using B, you could use something like
$ echo 'apple pear melon banana cherry papaya' | sed 's/([[:alnum:]])a([[:alnum:]])/1u2/g'
apple peur melon bunana cherry pupaya
That is, match an alphanumeric character on either side of the a, and then include these two flanking characters in the replacement. Note that since the replacement only happens for non-overlapping matches, this would not properly substitute the a's in a string containing multiple consecutive a's (or a's in every second position). See how banana that does not come out as bununa due to this.
To sort that out, you could introduce a loop in the sed program:
sed -e :top -e 's/([[:alnum:]])a([[:alnum:]])/1u2/g' -e ttop
This performs the replacement over the input line as many times as needed until all overlapping pattern matches have been handled.
1
Note that POSIX doesn't specify[[:<:]]. While it's shaped like a POSIX character class, it's not a character class at all.
– Stéphane Chazelas
Mar 28 at 13:01
For ast-open'ssed,bmeans a backspace character (like in many other utilities includingecho,printfandawkand$'...').
– Stéphane Chazelas
Mar 28 at 13:02
The section about The additional word delimiters<and>are provided to ease compatibility with traditional SVR4 systems but are not portable and should be avoided in the re_format man page of OpenBSD makes little sense.<is more portable than[[:<:]]and comes fromvi(so from BSD) long before SVR4. AFAIK,bcomes fromperl
– Stéphane Chazelas
Mar 28 at 13:18
@StéphaneChazelas Thanks. I might submit a bug report/patch to the OpenBSD lists when I find the time. It's been on my mind.
– Kusalananda♦
Mar 28 at 13:20
[^b]matches a collating element , so could match more than one character.printf '%sn' 'abdz' | LC_ALL=hu_HU.UTF-8 sed 's/[^b]/<&>/g'outputs<a>b<dz>with GNUsedfor instance.
– Stéphane Chazelas
Mar 28 at 13:37
|
show 1 more comment
Neither of b or B is a character. Both are zero-width patterns that matches between characters.
The b pattern matches at a word boundary, i.e. between a character that is a "word character" and a character that is not a "word character".
The B pattern matches at a non-word boundary, i.e. between a two characters that are both either "word characters" or not.
The pattern [^b] matches one character. This is why pear is transformed into pu, you replace ear (the a and the surrounding characters).
With GNU sed, [^b] matches a character that is not an or a b.
There is no way to use a character class to replace the use of B that I'm aware of.
The b and B patterns are supported by GNU sed. Both GNU sed and BSD sed also has < and > for explicitly matching at the start and end of a word, and BSD sed additionally supports the POSIX patterns [[:<:]] and [[:>:]] (but GNU sed does not). The POSIX patterns can't be negated ([^[:>:]] does not work).
To get a similar effect without using B, you could use something like
$ echo 'apple pear melon banana cherry papaya' | sed 's/([[:alnum:]])a([[:alnum:]])/1u2/g'
apple peur melon bunana cherry pupaya
That is, match an alphanumeric character on either side of the a, and then include these two flanking characters in the replacement. Note that since the replacement only happens for non-overlapping matches, this would not properly substitute the a's in a string containing multiple consecutive a's (or a's in every second position). See how banana that does not come out as bununa due to this.
To sort that out, you could introduce a loop in the sed program:
sed -e :top -e 's/([[:alnum:]])a([[:alnum:]])/1u2/g' -e ttop
This performs the replacement over the input line as many times as needed until all overlapping pattern matches have been handled.
1
Note that POSIX doesn't specify[[:<:]]. While it's shaped like a POSIX character class, it's not a character class at all.
– Stéphane Chazelas
Mar 28 at 13:01
For ast-open'ssed,bmeans a backspace character (like in many other utilities includingecho,printfandawkand$'...').
– Stéphane Chazelas
Mar 28 at 13:02
The section about The additional word delimiters<and>are provided to ease compatibility with traditional SVR4 systems but are not portable and should be avoided in the re_format man page of OpenBSD makes little sense.<is more portable than[[:<:]]and comes fromvi(so from BSD) long before SVR4. AFAIK,bcomes fromperl
– Stéphane Chazelas
Mar 28 at 13:18
@StéphaneChazelas Thanks. I might submit a bug report/patch to the OpenBSD lists when I find the time. It's been on my mind.
– Kusalananda♦
Mar 28 at 13:20
[^b]matches a collating element , so could match more than one character.printf '%sn' 'abdz' | LC_ALL=hu_HU.UTF-8 sed 's/[^b]/<&>/g'outputs<a>b<dz>with GNUsedfor instance.
– Stéphane Chazelas
Mar 28 at 13:37
|
show 1 more comment
Neither of b or B is a character. Both are zero-width patterns that matches between characters.
The b pattern matches at a word boundary, i.e. between a character that is a "word character" and a character that is not a "word character".
The B pattern matches at a non-word boundary, i.e. between a two characters that are both either "word characters" or not.
The pattern [^b] matches one character. This is why pear is transformed into pu, you replace ear (the a and the surrounding characters).
With GNU sed, [^b] matches a character that is not an or a b.
There is no way to use a character class to replace the use of B that I'm aware of.
The b and B patterns are supported by GNU sed. Both GNU sed and BSD sed also has < and > for explicitly matching at the start and end of a word, and BSD sed additionally supports the POSIX patterns [[:<:]] and [[:>:]] (but GNU sed does not). The POSIX patterns can't be negated ([^[:>:]] does not work).
To get a similar effect without using B, you could use something like
$ echo 'apple pear melon banana cherry papaya' | sed 's/([[:alnum:]])a([[:alnum:]])/1u2/g'
apple peur melon bunana cherry pupaya
That is, match an alphanumeric character on either side of the a, and then include these two flanking characters in the replacement. Note that since the replacement only happens for non-overlapping matches, this would not properly substitute the a's in a string containing multiple consecutive a's (or a's in every second position). See how banana that does not come out as bununa due to this.
To sort that out, you could introduce a loop in the sed program:
sed -e :top -e 's/([[:alnum:]])a([[:alnum:]])/1u2/g' -e ttop
This performs the replacement over the input line as many times as needed until all overlapping pattern matches have been handled.
Neither of b or B is a character. Both are zero-width patterns that matches between characters.
The b pattern matches at a word boundary, i.e. between a character that is a "word character" and a character that is not a "word character".
The B pattern matches at a non-word boundary, i.e. between a two characters that are both either "word characters" or not.
The pattern [^b] matches one character. This is why pear is transformed into pu, you replace ear (the a and the surrounding characters).
With GNU sed, [^b] matches a character that is not an or a b.
There is no way to use a character class to replace the use of B that I'm aware of.
The b and B patterns are supported by GNU sed. Both GNU sed and BSD sed also has < and > for explicitly matching at the start and end of a word, and BSD sed additionally supports the POSIX patterns [[:<:]] and [[:>:]] (but GNU sed does not). The POSIX patterns can't be negated ([^[:>:]] does not work).
To get a similar effect without using B, you could use something like
$ echo 'apple pear melon banana cherry papaya' | sed 's/([[:alnum:]])a([[:alnum:]])/1u2/g'
apple peur melon bunana cherry pupaya
That is, match an alphanumeric character on either side of the a, and then include these two flanking characters in the replacement. Note that since the replacement only happens for non-overlapping matches, this would not properly substitute the a's in a string containing multiple consecutive a's (or a's in every second position). See how banana that does not come out as bununa due to this.
To sort that out, you could introduce a loop in the sed program:
sed -e :top -e 's/([[:alnum:]])a([[:alnum:]])/1u2/g' -e ttop
This performs the replacement over the input line as many times as needed until all overlapping pattern matches have been handled.
edited Mar 28 at 13:14
answered Mar 28 at 12:21
Kusalananda♦Kusalananda
140k17261435
140k17261435
1
Note that POSIX doesn't specify[[:<:]]. While it's shaped like a POSIX character class, it's not a character class at all.
– Stéphane Chazelas
Mar 28 at 13:01
For ast-open'ssed,bmeans a backspace character (like in many other utilities includingecho,printfandawkand$'...').
– Stéphane Chazelas
Mar 28 at 13:02
The section about The additional word delimiters<and>are provided to ease compatibility with traditional SVR4 systems but are not portable and should be avoided in the re_format man page of OpenBSD makes little sense.<is more portable than[[:<:]]and comes fromvi(so from BSD) long before SVR4. AFAIK,bcomes fromperl
– Stéphane Chazelas
Mar 28 at 13:18
@StéphaneChazelas Thanks. I might submit a bug report/patch to the OpenBSD lists when I find the time. It's been on my mind.
– Kusalananda♦
Mar 28 at 13:20
[^b]matches a collating element , so could match more than one character.printf '%sn' 'abdz' | LC_ALL=hu_HU.UTF-8 sed 's/[^b]/<&>/g'outputs<a>b<dz>with GNUsedfor instance.
– Stéphane Chazelas
Mar 28 at 13:37
|
show 1 more comment
1
Note that POSIX doesn't specify[[:<:]]. While it's shaped like a POSIX character class, it's not a character class at all.
– Stéphane Chazelas
Mar 28 at 13:01
For ast-open'ssed,bmeans a backspace character (like in many other utilities includingecho,printfandawkand$'...').
– Stéphane Chazelas
Mar 28 at 13:02
The section about The additional word delimiters<and>are provided to ease compatibility with traditional SVR4 systems but are not portable and should be avoided in the re_format man page of OpenBSD makes little sense.<is more portable than[[:<:]]and comes fromvi(so from BSD) long before SVR4. AFAIK,bcomes fromperl
– Stéphane Chazelas
Mar 28 at 13:18
@StéphaneChazelas Thanks. I might submit a bug report/patch to the OpenBSD lists when I find the time. It's been on my mind.
– Kusalananda♦
Mar 28 at 13:20
[^b]matches a collating element , so could match more than one character.printf '%sn' 'abdz' | LC_ALL=hu_HU.UTF-8 sed 's/[^b]/<&>/g'outputs<a>b<dz>with GNUsedfor instance.
– Stéphane Chazelas
Mar 28 at 13:37
1
1
Note that POSIX doesn't specify
[[:<:]]. While it's shaped like a POSIX character class, it's not a character class at all.– Stéphane Chazelas
Mar 28 at 13:01
Note that POSIX doesn't specify
[[:<:]]. While it's shaped like a POSIX character class, it's not a character class at all.– Stéphane Chazelas
Mar 28 at 13:01
For ast-open's
sed, b means a backspace character (like in many other utilities including echo, printf and awk and $'...').– Stéphane Chazelas
Mar 28 at 13:02
For ast-open's
sed, b means a backspace character (like in many other utilities including echo, printf and awk and $'...').– Stéphane Chazelas
Mar 28 at 13:02
The section about The additional word delimiters
< and > are provided to ease compatibility with traditional SVR4 systems but are not portable and should be avoided in the re_format man page of OpenBSD makes little sense. < is more portable than [[:<:]] and comes from vi (so from BSD) long before SVR4. AFAIK, b comes from perl– Stéphane Chazelas
Mar 28 at 13:18
The section about The additional word delimiters
< and > are provided to ease compatibility with traditional SVR4 systems but are not portable and should be avoided in the re_format man page of OpenBSD makes little sense. < is more portable than [[:<:]] and comes from vi (so from BSD) long before SVR4. AFAIK, b comes from perl– Stéphane Chazelas
Mar 28 at 13:18
@StéphaneChazelas Thanks. I might submit a bug report/patch to the OpenBSD lists when I find the time. It's been on my mind.
– Kusalananda♦
Mar 28 at 13:20
@StéphaneChazelas Thanks. I might submit a bug report/patch to the OpenBSD lists when I find the time. It's been on my mind.
– Kusalananda♦
Mar 28 at 13:20
[^b] matches a collating element , so could match more than one character. printf '%sn' 'abdz' | LC_ALL=hu_HU.UTF-8 sed 's/[^b]/<&>/g' outputs <a>b<dz> with GNU sed for instance.– Stéphane Chazelas
Mar 28 at 13:37
[^b] matches a collating element , so could match more than one character. printf '%sn' 'abdz' | LC_ALL=hu_HU.UTF-8 sed 's/[^b]/<&>/g' outputs <a>b<dz> with GNU sed for instance.– Stéphane Chazelas
Mar 28 at 13:37
|
show 1 more comment
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f509199%2fsed-anchor-characters-in%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
-sed