find a pair of words that appear the most of the times together Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election Results Why I closed the “Why is Kali so hard” questionfind the word that appears the most at the beginning of a line from entire paragraphHow to practice for command line?How to SED these paragraphs to MCQ format?grep with piping and showing multiple linesMerge two files: two lines, partial line, two lines, partial line, etclinux + delete words from file that appear in another fileHow do I find username that in total uses the most CPU time?Convert one (long) column into multiple (short) columns of unequal lengthsExtract number of length n from field and return stringscript to parse file for two consecutive lines of unequal lengthHow can I find all lines containing two specified words?

2001: A Space Odyssey's use of the song "Daisy Bell" (Bicycle Built for Two); life imitates art or vice-versa?

Is CEO the profession with the most psychopaths?

If my PI received research grants from a company to be able to pay my postdoc salary, did I have a potential conflict interest too?

Why wasn't DOSKEY integrated with COMMAND.COM?

Can you use the Shield Master feat to shove someone before you make an attack by using a Readied action?

Is there a kind of relay only consumes power when switching?

Is safe to use va_start macro with this as parameter?

Withdrew £2800, but only £2000 shows as withdrawn on online banking; what are my obligations?

Circuit to "zoom in" on mV fluctuations of a DC signal?

8 Prisoners wearing hats

How to react to hostile behavior from a senior developer?

Can melee weapons be used to deliver Contact Poisons?

Around usage results

For a new assistant professor in CS, how to build/manage a publication pipeline

If a VARCHAR(MAX) column is included in an index, is the entire value always stored in the index page(s)?

Do jazz musicians improvise on the parent scale in addition to the chord-scales?

When was Kai Tak permanently closed to cargo service?

How could we fake a moon landing now?

In what way is everyone not a utilitarian

Did MS DOS itself ever use blinking text?

How come Sam didn't become Lord of Horn Hill?

How can I use the Python library networkx from Mathematica?

Compare a given version number in the form major.minor.build.patch and see if one is less than the other

Can a party unilaterally change candidates in preparation for a General election?



find a pair of words that appear the most of the times together



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election Results
Why I closed the “Why is Kali so hard” questionfind the word that appears the most at the beginning of a line from entire paragraphHow to practice for command line?How to SED these paragraphs to MCQ format?grep with piping and showing multiple linesMerge two files: two lines, partial line, two lines, partial line, etclinux + delete words from file that appear in another fileHow do I find username that in total uses the most CPU time?Convert one (long) column into multiple (short) columns of unequal lengthsExtract number of length n from field and return stringscript to parse file for two consecutive lines of unequal lengthHow can I find all lines containing two specified words?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I have 10 text files, in each file i have a chapter from a book, i want to find the pair of words that appear the most of the time together in a line i.e:



chapter1:



hello world good boy green sun

good green boy sun world hello


chapter2:



chapter3:



.....etc



Output wanted for chapter1:



hello world (alphabet order)









share|improve this question









New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 3





    The output should have "boy green" also, right?

    – Guru
    11 hours ago











  • yes "boy green" also, didn't see

    – John B
    11 hours ago






  • 1





    what about if you had a line like Hello world Hello? it should come in output? and how ? Hello world or world Hello?

    – αғsнιη
    11 hours ago












  • hello world , and this will count for 2 to the pair "hello world" always alphabetical

    – John B
    11 hours ago






  • 1





    how about HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications

    – αғsнιη
    10 hours ago


















1















I have 10 text files, in each file i have a chapter from a book, i want to find the pair of words that appear the most of the time together in a line i.e:



chapter1:



hello world good boy green sun

good green boy sun world hello


chapter2:



chapter3:



.....etc



Output wanted for chapter1:



hello world (alphabet order)









share|improve this question









New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 3





    The output should have "boy green" also, right?

    – Guru
    11 hours ago











  • yes "boy green" also, didn't see

    – John B
    11 hours ago






  • 1





    what about if you had a line like Hello world Hello? it should come in output? and how ? Hello world or world Hello?

    – αғsнιη
    11 hours ago












  • hello world , and this will count for 2 to the pair "hello world" always alphabetical

    – John B
    11 hours ago






  • 1





    how about HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications

    – αғsнιη
    10 hours ago














1












1








1


1






I have 10 text files, in each file i have a chapter from a book, i want to find the pair of words that appear the most of the time together in a line i.e:



chapter1:



hello world good boy green sun

good green boy sun world hello


chapter2:



chapter3:



.....etc



Output wanted for chapter1:



hello world (alphabet order)









share|improve this question









New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I have 10 text files, in each file i have a chapter from a book, i want to find the pair of words that appear the most of the time together in a line i.e:



chapter1:



hello world good boy green sun

good green boy sun world hello


chapter2:



chapter3:



.....etc



Output wanted for chapter1:



hello world (alphabet order)






linux text-processing awk sed grep






share|improve this question









New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 10 hours ago









mosvy

10.2k11237




10.2k11237






New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 12 hours ago









John BJohn B

62




62




New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 3





    The output should have "boy green" also, right?

    – Guru
    11 hours ago











  • yes "boy green" also, didn't see

    – John B
    11 hours ago






  • 1





    what about if you had a line like Hello world Hello? it should come in output? and how ? Hello world or world Hello?

    – αғsнιη
    11 hours ago












  • hello world , and this will count for 2 to the pair "hello world" always alphabetical

    – John B
    11 hours ago






  • 1





    how about HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications

    – αғsнιη
    10 hours ago













  • 3





    The output should have "boy green" also, right?

    – Guru
    11 hours ago











  • yes "boy green" also, didn't see

    – John B
    11 hours ago






  • 1





    what about if you had a line like Hello world Hello? it should come in output? and how ? Hello world or world Hello?

    – αғsнιη
    11 hours ago












  • hello world , and this will count for 2 to the pair "hello world" always alphabetical

    – John B
    11 hours ago






  • 1





    how about HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications

    – αғsнιη
    10 hours ago








3




3





The output should have "boy green" also, right?

– Guru
11 hours ago





The output should have "boy green" also, right?

– Guru
11 hours ago













yes "boy green" also, didn't see

– John B
11 hours ago





yes "boy green" also, didn't see

– John B
11 hours ago




1




1





what about if you had a line like Hello world Hello? it should come in output? and how ? Hello world or world Hello?

– αғsнιη
11 hours ago






what about if you had a line like Hello world Hello? it should come in output? and how ? Hello world or world Hello?

– αғsнιη
11 hours ago














hello world , and this will count for 2 to the pair "hello world" always alphabetical

– John B
11 hours ago





hello world , and this will count for 2 to the pair "hello world" always alphabetical

– John B
11 hours ago




1




1





how about HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications

– αғsнιη
10 hours ago






how about HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications

– αғsнιη
10 hours ago











2 Answers
2






active

oldest

votes


















0














Try this,



  1. Use awk to print each pair of words.

  2. Use perl to sort the words in a pair (via).

  3. Use sort and uniq -c to count occurrences each pair.


awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file 
| perl -ane '$,=" "; print sort @F; print "n";'
| sort | uniq -c | sort -b -k1nr -k2


Output:



 2 boy green
2 hello world
1 boy good
1 boy sun
1 good green
1 good world
1 green sun
1 sun world





share|improve this answer

























  • can not use perl or pipeline..

    – John B
    10 hours ago






  • 2





    Why can't you, @John? Those are standard utilities on most Linux systems.

    – Jeff Schaller
    10 hours ago






  • 1





    Yes, this kind of information should be in your question.

    – RoVo
    10 hours ago


















0














awk '

$0 = tolower($0)
for (i = 1; i < NF; i++)
pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
c = ++count[pair]
if (c > max) max = c


END
for (pair in count)
if (count[pair] == max)
print pair
'





share|improve this answer























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    John B is a new contributor. Be nice, and check out our Code of Conduct.









    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f513022%2ffind-a-pair-of-words-that-appear-the-most-of-the-times-together%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Try this,



    1. Use awk to print each pair of words.

    2. Use perl to sort the words in a pair (via).

    3. Use sort and uniq -c to count occurrences each pair.


    awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file 
    | perl -ane '$,=" "; print sort @F; print "n";'
    | sort | uniq -c | sort -b -k1nr -k2


    Output:



     2 boy green
    2 hello world
    1 boy good
    1 boy sun
    1 good green
    1 good world
    1 green sun
    1 sun world





    share|improve this answer

























    • can not use perl or pipeline..

      – John B
      10 hours ago






    • 2





      Why can't you, @John? Those are standard utilities on most Linux systems.

      – Jeff Schaller
      10 hours ago






    • 1





      Yes, this kind of information should be in your question.

      – RoVo
      10 hours ago















    0














    Try this,



    1. Use awk to print each pair of words.

    2. Use perl to sort the words in a pair (via).

    3. Use sort and uniq -c to count occurrences each pair.


    awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file 
    | perl -ane '$,=" "; print sort @F; print "n";'
    | sort | uniq -c | sort -b -k1nr -k2


    Output:



     2 boy green
    2 hello world
    1 boy good
    1 boy sun
    1 good green
    1 good world
    1 green sun
    1 sun world





    share|improve this answer

























    • can not use perl or pipeline..

      – John B
      10 hours ago






    • 2





      Why can't you, @John? Those are standard utilities on most Linux systems.

      – Jeff Schaller
      10 hours ago






    • 1





      Yes, this kind of information should be in your question.

      – RoVo
      10 hours ago













    0












    0








    0







    Try this,



    1. Use awk to print each pair of words.

    2. Use perl to sort the words in a pair (via).

    3. Use sort and uniq -c to count occurrences each pair.


    awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file 
    | perl -ane '$,=" "; print sort @F; print "n";'
    | sort | uniq -c | sort -b -k1nr -k2


    Output:



     2 boy green
    2 hello world
    1 boy good
    1 boy sun
    1 good green
    1 good world
    1 green sun
    1 sun world





    share|improve this answer















    Try this,



    1. Use awk to print each pair of words.

    2. Use perl to sort the words in a pair (via).

    3. Use sort and uniq -c to count occurrences each pair.


    awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file 
    | perl -ane '$,=" "; print sort @F; print "n";'
    | sort | uniq -c | sort -b -k1nr -k2


    Output:



     2 boy green
    2 hello world
    1 boy good
    1 boy sun
    1 good green
    1 good world
    1 green sun
    1 sun world






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 10 hours ago

























    answered 11 hours ago









    RoVoRoVo

    3,960317




    3,960317












    • can not use perl or pipeline..

      – John B
      10 hours ago






    • 2





      Why can't you, @John? Those are standard utilities on most Linux systems.

      – Jeff Schaller
      10 hours ago






    • 1





      Yes, this kind of information should be in your question.

      – RoVo
      10 hours ago

















    • can not use perl or pipeline..

      – John B
      10 hours ago






    • 2





      Why can't you, @John? Those are standard utilities on most Linux systems.

      – Jeff Schaller
      10 hours ago






    • 1





      Yes, this kind of information should be in your question.

      – RoVo
      10 hours ago
















    can not use perl or pipeline..

    – John B
    10 hours ago





    can not use perl or pipeline..

    – John B
    10 hours ago




    2




    2





    Why can't you, @John? Those are standard utilities on most Linux systems.

    – Jeff Schaller
    10 hours ago





    Why can't you, @John? Those are standard utilities on most Linux systems.

    – Jeff Schaller
    10 hours ago




    1




    1





    Yes, this kind of information should be in your question.

    – RoVo
    10 hours ago





    Yes, this kind of information should be in your question.

    – RoVo
    10 hours ago













    0














    awk '

    $0 = tolower($0)
    for (i = 1; i < NF; i++)
    pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
    c = ++count[pair]
    if (c > max) max = c


    END
    for (pair in count)
    if (count[pair] == max)
    print pair
    '





    share|improve this answer



























      0














      awk '

      $0 = tolower($0)
      for (i = 1; i < NF; i++)
      pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
      c = ++count[pair]
      if (c > max) max = c


      END
      for (pair in count)
      if (count[pair] == max)
      print pair
      '





      share|improve this answer

























        0












        0








        0







        awk '

        $0 = tolower($0)
        for (i = 1; i < NF; i++)
        pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
        c = ++count[pair]
        if (c > max) max = c


        END
        for (pair in count)
        if (count[pair] == max)
        print pair
        '





        share|improve this answer













        awk '

        $0 = tolower($0)
        for (i = 1; i < NF; i++)
        pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
        c = ++count[pair]
        if (c > max) max = c


        END
        for (pair in count)
        if (count[pair] == max)
        print pair
        '






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 10 hours ago









        Stéphane ChazelasStéphane Chazelas

        315k57597955




        315k57597955




















            John B is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            John B is a new contributor. Be nice, and check out our Code of Conduct.












            John B is a new contributor. Be nice, and check out our Code of Conduct.











            John B is a new contributor. Be nice, and check out our Code of Conduct.














            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f513022%2ffind-a-pair-of-words-that-appear-the-most-of-the-times-together%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            -awk, grep, linux, sed, text-processing

            Popular posts from this blog

            Mobil Contents History Mobil brands Former Mobil brands Lukoil transaction Mobil UK Mobil Australia Mobil New Zealand Mobil Greece Mobil in Japan Mobil in Canada Mobil Egypt See also References External links Navigation menuwww.mobil.com"Mobil Corporation"the original"Our Houston campus""Business & Finance: Socony-Vacuum Corp.""Popular Mechanics""Lubrite Technologies""Exxon Mobil campus 'clearly happening'""Toledo Blade - Google News Archive Search""The Lion and the Moose - How 2 Executives Pulled off the Biggest Merger Ever""ExxonMobil Press Release""Lubricants""Archived copy"the original"Mobil 1™ and Mobil Super™ motor oil and synthetic motor oil - Mobil™ Motor Oils""Mobil Delvac""Mobil Industrial website""The State of Competition in Gasoline Marketing: The Effects of Refiner Operations at Retail""Mobil Travel Guide to become Forbes Travel Guide""Hotel Rankings: Forbes Merges with Mobil"the original"Jamieson oil industry history""Mobil news""Caltex pumps for control""Watchdog blocks Caltex bid""Exxon Mobil sells service station network""Mobil Oil New Zealand Limited is New Zealand's oldest oil company, with predecessor companies having first established a presence in the country in 1896""ExxonMobil subsidiaries have a business history in New Zealand stretching back more than 120 years. We are involved in petroleum refining and distribution and the marketing of fuels, lubricants and chemical products""Archived copy"the original"Exxon Mobil to Sell Its Japanese Arm for $3.9 Billion""Gas station merger will end Esso and Mobil's long run in Japan""Esso moves to affiliate itself with PC Optimum, no longer Aeroplan, in loyalty point switch""Mobil brand of gas stations to launch in Canada after deal for 213 Loblaws-owned locations""Mobil Nears Completion of Rebranding 200 Loblaw Gas Stations""Learn about ExxonMobil's operations in Egypt""Petrol and Diesel Service Stations in Egypt - Mobil"Official websiteExxon Mobil corporate websiteMobil Industrial official websiteeeeeeeeDA04275022275790-40000 0001 0860 5061n82045453134887257134887257

            Frič See also Navigation menuinternal link

            Identify plant with long narrow paired leaves and reddish stems Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?What is this plant with long sharp leaves? Is it a weed?What is this 3ft high, stalky plant, with mid sized narrow leaves?What is this young shrub with opposite ovate, crenate leaves and reddish stems?What is this plant with large broad serrated leaves?Identify this upright branching weed with long leaves and reddish stemsPlease help me identify this bulbous plant with long, broad leaves and white flowersWhat is this small annual with narrow gray/green leaves and rust colored daisy-type flowers?What is this chilli plant?Does anyone know what type of chilli plant this is?Help identify this plant