find a pair of words that appear the most of the times together Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election Results Why I closed the “Why is Kali so hard” questionfind the word that appears the most at the beginning of a line from entire paragraphHow to practice for command line?How to SED these paragraphs to MCQ format?grep with piping and showing multiple linesMerge two files: two lines, partial line, two lines, partial line, etclinux + delete words from file that appear in another fileHow do I find username that in total uses the most CPU time?Convert one (long) column into multiple (short) columns of unequal lengthsExtract number of length n from field and return stringscript to parse file for two consecutive lines of unequal lengthHow can I find all lines containing two specified words?
What do you call the main part of a joke?
Do I really need to have a message in a novel to appeal to readers?
Around usage results
Why aren't air breathing engines used as small first stages
Why do we bend a book to keep it straight?
Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?
How to react to hostile behavior from a senior developer?
Denied boarding although I have proper visa and documentation. To whom should I make a complaint?
Can melee weapons be used to deliver Contact Poisons?
How to answer "Have you ever been terminated?"
Is there a holomorphic function on open unit disc with this property?
Why do the resolve message appear first?
また usage in a dictionary
Do I really need recursive chmod to restrict access to a folder?
Trademark violation for app?
Using et al. for a last / senior author rather than for a first author
What is the escape velocity of a neutron particle (not neutron star)
When a candle burns, why does the top of wick glow if bottom of flame is hottest?
Is it a good idea to use CNN to classify 1D signal?
Why are both D and D# fitting into my E minor key?
Can an alien society believe that their star system is the universe?
Circuit to "zoom in" on mV fluctuations of a DC signal?
Irreducible of finite Krull dimension implies quasi-compact?
Is safe to use va_start macro with this as parameter?
find a pair of words that appear the most of the times together
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election Results
Why I closed the “Why is Kali so hard” questionfind the word that appears the most at the beginning of a line from entire paragraphHow to practice for command line?How to SED these paragraphs to MCQ format?grep with piping and showing multiple linesMerge two files: two lines, partial line, two lines, partial line, etclinux + delete words from file that appear in another fileHow do I find username that in total uses the most CPU time?Convert one (long) column into multiple (short) columns of unequal lengthsExtract number of length n from field and return stringscript to parse file for two consecutive lines of unequal lengthHow can I find all lines containing two specified words?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have 10 text files, in each file i have a chapter from a book, i want to find the pair of words that appear the most of the time together in a line i.e:
chapter1:
hello world good boy green sun
good green boy sun world hello
chapter2:
chapter3:
.....etc
Output wanted for chapter1:
hello world (alphabet order)
linux text-processing awk sed grep
New contributor
John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
|
show 1 more comment
I have 10 text files, in each file i have a chapter from a book, i want to find the pair of words that appear the most of the time together in a line i.e:
chapter1:
hello world good boy green sun
good green boy sun world hello
chapter2:
chapter3:
.....etc
Output wanted for chapter1:
hello world (alphabet order)
linux text-processing awk sed grep
New contributor
John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
3
The output should have "boy green" also, right?
– Guru
9 hours ago
yes "boy green" also, didn't see
– John B
9 hours ago
1
what about if you had a line likeHello world Hello? it should come in output? and how ?Hello worldorworld Hello?
– αғsнιη
9 hours ago
hello world , and this will count for 2 to the pair "hello world" always alphabetical
– John B
9 hours ago
1
how aboutHellonworldnHello?nis actual new-line character. please edit your question to answer for comments asking for clarifications
– αғsнιη
8 hours ago
|
show 1 more comment
I have 10 text files, in each file i have a chapter from a book, i want to find the pair of words that appear the most of the time together in a line i.e:
chapter1:
hello world good boy green sun
good green boy sun world hello
chapter2:
chapter3:
.....etc
Output wanted for chapter1:
hello world (alphabet order)
linux text-processing awk sed grep
New contributor
John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I have 10 text files, in each file i have a chapter from a book, i want to find the pair of words that appear the most of the time together in a line i.e:
chapter1:
hello world good boy green sun
good green boy sun world hello
chapter2:
chapter3:
.....etc
Output wanted for chapter1:
hello world (alphabet order)
linux text-processing awk sed grep
linux text-processing awk sed grep
New contributor
John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 7 hours ago
mosvy
10.1k11237
10.1k11237
New contributor
John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 9 hours ago
John BJohn B
62
62
New contributor
John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
3
The output should have "boy green" also, right?
– Guru
9 hours ago
yes "boy green" also, didn't see
– John B
9 hours ago
1
what about if you had a line likeHello world Hello? it should come in output? and how ?Hello worldorworld Hello?
– αғsнιη
9 hours ago
hello world , and this will count for 2 to the pair "hello world" always alphabetical
– John B
9 hours ago
1
how aboutHellonworldnHello?nis actual new-line character. please edit your question to answer for comments asking for clarifications
– αғsнιη
8 hours ago
|
show 1 more comment
3
The output should have "boy green" also, right?
– Guru
9 hours ago
yes "boy green" also, didn't see
– John B
9 hours ago
1
what about if you had a line likeHello world Hello? it should come in output? and how ?Hello worldorworld Hello?
– αғsнιη
9 hours ago
hello world , and this will count for 2 to the pair "hello world" always alphabetical
– John B
9 hours ago
1
how aboutHellonworldnHello?nis actual new-line character. please edit your question to answer for comments asking for clarifications
– αғsнιη
8 hours ago
3
3
The output should have "boy green" also, right?
– Guru
9 hours ago
The output should have "boy green" also, right?
– Guru
9 hours ago
yes "boy green" also, didn't see
– John B
9 hours ago
yes "boy green" also, didn't see
– John B
9 hours ago
1
1
what about if you had a line like
Hello world Hello? it should come in output? and how ? Hello world or world Hello?– αғsнιη
9 hours ago
what about if you had a line like
Hello world Hello? it should come in output? and how ? Hello world or world Hello?– αғsнιη
9 hours ago
hello world , and this will count for 2 to the pair "hello world" always alphabetical
– John B
9 hours ago
hello world , and this will count for 2 to the pair "hello world" always alphabetical
– John B
9 hours ago
1
1
how about
HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications– αғsнιη
8 hours ago
how about
HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications– αғsнιη
8 hours ago
|
show 1 more comment
2 Answers
2
active
oldest
votes
Try this,
- Use
awkto print each pair of words. - Use
perlto sort the words in a pair (via). - Use
sortanduniq -cto count occurrences each pair.
awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file
| perl -ane '$,=" "; print sort @F; print "n";'
| sort | uniq -c | sort -b -k1nr -k2
Output:
2 boy green
2 hello world
1 boy good
1 boy sun
1 good green
1 good world
1 green sun
1 sun world
can not use perl or pipeline..
– John B
8 hours ago
2
Why can't you, @John? Those are standard utilities on most Linux systems.
– Jeff Schaller♦
8 hours ago
1
Yes, this kind of information should be in your question.
– RoVo
8 hours ago
add a comment |
awk '
$0 = tolower($0)
for (i = 1; i < NF; i++)
pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
c = ++count[pair]
if (c > max) max = c
END
for (pair in count)
if (count[pair] == max)
print pair
'
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
John B is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f513022%2ffind-a-pair-of-words-that-appear-the-most-of-the-times-together%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try this,
- Use
awkto print each pair of words. - Use
perlto sort the words in a pair (via). - Use
sortanduniq -cto count occurrences each pair.
awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file
| perl -ane '$,=" "; print sort @F; print "n";'
| sort | uniq -c | sort -b -k1nr -k2
Output:
2 boy green
2 hello world
1 boy good
1 boy sun
1 good green
1 good world
1 green sun
1 sun world
can not use perl or pipeline..
– John B
8 hours ago
2
Why can't you, @John? Those are standard utilities on most Linux systems.
– Jeff Schaller♦
8 hours ago
1
Yes, this kind of information should be in your question.
– RoVo
8 hours ago
add a comment |
Try this,
- Use
awkto print each pair of words. - Use
perlto sort the words in a pair (via). - Use
sortanduniq -cto count occurrences each pair.
awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file
| perl -ane '$,=" "; print sort @F; print "n";'
| sort | uniq -c | sort -b -k1nr -k2
Output:
2 boy green
2 hello world
1 boy good
1 boy sun
1 good green
1 good world
1 green sun
1 sun world
can not use perl or pipeline..
– John B
8 hours ago
2
Why can't you, @John? Those are standard utilities on most Linux systems.
– Jeff Schaller♦
8 hours ago
1
Yes, this kind of information should be in your question.
– RoVo
8 hours ago
add a comment |
Try this,
- Use
awkto print each pair of words. - Use
perlto sort the words in a pair (via). - Use
sortanduniq -cto count occurrences each pair.
awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file
| perl -ane '$,=" "; print sort @F; print "n";'
| sort | uniq -c | sort -b -k1nr -k2
Output:
2 boy green
2 hello world
1 boy good
1 boy sun
1 good green
1 good world
1 green sun
1 sun world
Try this,
- Use
awkto print each pair of words. - Use
perlto sort the words in a pair (via). - Use
sortanduniq -cto count occurrences each pair.
awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file
| perl -ane '$,=" "; print sort @F; print "n";'
| sort | uniq -c | sort -b -k1nr -k2
Output:
2 boy green
2 hello world
1 boy good
1 boy sun
1 good green
1 good world
1 green sun
1 sun world
edited 8 hours ago
answered 8 hours ago
RoVoRoVo
3,960317
3,960317
can not use perl or pipeline..
– John B
8 hours ago
2
Why can't you, @John? Those are standard utilities on most Linux systems.
– Jeff Schaller♦
8 hours ago
1
Yes, this kind of information should be in your question.
– RoVo
8 hours ago
add a comment |
can not use perl or pipeline..
– John B
8 hours ago
2
Why can't you, @John? Those are standard utilities on most Linux systems.
– Jeff Schaller♦
8 hours ago
1
Yes, this kind of information should be in your question.
– RoVo
8 hours ago
can not use perl or pipeline..
– John B
8 hours ago
can not use perl or pipeline..
– John B
8 hours ago
2
2
Why can't you, @John? Those are standard utilities on most Linux systems.
– Jeff Schaller♦
8 hours ago
Why can't you, @John? Those are standard utilities on most Linux systems.
– Jeff Schaller♦
8 hours ago
1
1
Yes, this kind of information should be in your question.
– RoVo
8 hours ago
Yes, this kind of information should be in your question.
– RoVo
8 hours ago
add a comment |
awk '
$0 = tolower($0)
for (i = 1; i < NF; i++)
pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
c = ++count[pair]
if (c > max) max = c
END
for (pair in count)
if (count[pair] == max)
print pair
'
add a comment |
awk '
$0 = tolower($0)
for (i = 1; i < NF; i++)
pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
c = ++count[pair]
if (c > max) max = c
END
for (pair in count)
if (count[pair] == max)
print pair
'
add a comment |
awk '
$0 = tolower($0)
for (i = 1; i < NF; i++)
pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
c = ++count[pair]
if (c > max) max = c
END
for (pair in count)
if (count[pair] == max)
print pair
'
awk '
$0 = tolower($0)
for (i = 1; i < NF; i++)
pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
c = ++count[pair]
if (c > max) max = c
END
for (pair in count)
if (count[pair] == max)
print pair
'
answered 7 hours ago
Stéphane ChazelasStéphane Chazelas
315k57597955
315k57597955
add a comment |
add a comment |
John B is a new contributor. Be nice, and check out our Code of Conduct.
John B is a new contributor. Be nice, and check out our Code of Conduct.
John B is a new contributor. Be nice, and check out our Code of Conduct.
John B is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f513022%2ffind-a-pair-of-words-that-appear-the-most-of-the-times-together%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
-awk, grep, linux, sed, text-processing
3
The output should have "boy green" also, right?
– Guru
9 hours ago
yes "boy green" also, didn't see
– John B
9 hours ago
1
what about if you had a line like
Hello world Hello? it should come in output? and how ?Hello worldorworld Hello?– αғsнιη
9 hours ago
hello world , and this will count for 2 to the pair "hello world" always alphabetical
– John B
9 hours ago
1
how about
HellonworldnHello?nis actual new-line character. please edit your question to answer for comments asking for clarifications– αғsнιη
8 hours ago