find a pair of words that appear the most of the times together Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election Results Why I closed the “Why is Kali so hard” questionfind the word that appears the most at the beginning of a line from entire paragraphHow to practice for command line?How to SED these paragraphs to MCQ format?grep with piping and showing multiple linesMerge two files: two lines, partial line, two lines, partial line, etclinux + delete words from file that appear in another fileHow do I find username that in total uses the most CPU time?Convert one (long) column into multiple (short) columns of unequal lengthsExtract number of length n from field and return stringscript to parse file for two consecutive lines of unequal lengthHow can I find all lines containing two specified words?

What do you call the main part of a joke?

Do I really need to have a message in a novel to appeal to readers?

Around usage results

Why aren't air breathing engines used as small first stages

Why do we bend a book to keep it straight?

Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?

How to react to hostile behavior from a senior developer?

Denied boarding although I have proper visa and documentation. To whom should I make a complaint?

Can melee weapons be used to deliver Contact Poisons?

How to answer "Have you ever been terminated?"

Is there a holomorphic function on open unit disc with this property?

Why do the resolve message appear first?

また usage in a dictionary

Do I really need recursive chmod to restrict access to a folder?

Trademark violation for app?

Using et al. for a last / senior author rather than for a first author

What is the escape velocity of a neutron particle (not neutron star)

When a candle burns, why does the top of wick glow if bottom of flame is hottest?

Is it a good idea to use CNN to classify 1D signal?

Why are both D and D# fitting into my E minor key?

Can an alien society believe that their star system is the universe?

Circuit to "zoom in" on mV fluctuations of a DC signal?

Irreducible of finite Krull dimension implies quasi-compact?

Is safe to use va_start macro with this as parameter?



find a pair of words that appear the most of the times together



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election Results
Why I closed the “Why is Kali so hard” questionfind the word that appears the most at the beginning of a line from entire paragraphHow to practice for command line?How to SED these paragraphs to MCQ format?grep with piping and showing multiple linesMerge two files: two lines, partial line, two lines, partial line, etclinux + delete words from file that appear in another fileHow do I find username that in total uses the most CPU time?Convert one (long) column into multiple (short) columns of unequal lengthsExtract number of length n from field and return stringscript to parse file for two consecutive lines of unequal lengthHow can I find all lines containing two specified words?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I have 10 text files, in each file i have a chapter from a book, i want to find the pair of words that appear the most of the time together in a line i.e:



chapter1:



hello world good boy green sun

good green boy sun world hello


chapter2:



chapter3:



.....etc



Output wanted for chapter1:



hello world (alphabet order)









share|improve this question









New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 3





    The output should have "boy green" also, right?

    – Guru
    9 hours ago











  • yes "boy green" also, didn't see

    – John B
    9 hours ago






  • 1





    what about if you had a line like Hello world Hello? it should come in output? and how ? Hello world or world Hello?

    – αғsнιη
    9 hours ago












  • hello world , and this will count for 2 to the pair "hello world" always alphabetical

    – John B
    9 hours ago






  • 1





    how about HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications

    – αғsнιη
    8 hours ago


















1















I have 10 text files, in each file i have a chapter from a book, i want to find the pair of words that appear the most of the time together in a line i.e:



chapter1:



hello world good boy green sun

good green boy sun world hello


chapter2:



chapter3:



.....etc



Output wanted for chapter1:



hello world (alphabet order)









share|improve this question









New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 3





    The output should have "boy green" also, right?

    – Guru
    9 hours ago











  • yes "boy green" also, didn't see

    – John B
    9 hours ago






  • 1





    what about if you had a line like Hello world Hello? it should come in output? and how ? Hello world or world Hello?

    – αғsнιη
    9 hours ago












  • hello world , and this will count for 2 to the pair "hello world" always alphabetical

    – John B
    9 hours ago






  • 1





    how about HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications

    – αғsнιη
    8 hours ago














1












1








1


1






I have 10 text files, in each file i have a chapter from a book, i want to find the pair of words that appear the most of the time together in a line i.e:



chapter1:



hello world good boy green sun

good green boy sun world hello


chapter2:



chapter3:



.....etc



Output wanted for chapter1:



hello world (alphabet order)









share|improve this question









New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I have 10 text files, in each file i have a chapter from a book, i want to find the pair of words that appear the most of the time together in a line i.e:



chapter1:



hello world good boy green sun

good green boy sun world hello


chapter2:



chapter3:



.....etc



Output wanted for chapter1:



hello world (alphabet order)






linux text-processing awk sed grep






share|improve this question









New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 7 hours ago









mosvy

10.1k11237




10.1k11237






New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 9 hours ago









John BJohn B

62




62




New contributor




John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






John B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 3





    The output should have "boy green" also, right?

    – Guru
    9 hours ago











  • yes "boy green" also, didn't see

    – John B
    9 hours ago






  • 1





    what about if you had a line like Hello world Hello? it should come in output? and how ? Hello world or world Hello?

    – αғsнιη
    9 hours ago












  • hello world , and this will count for 2 to the pair "hello world" always alphabetical

    – John B
    9 hours ago






  • 1





    how about HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications

    – αғsнιη
    8 hours ago













  • 3





    The output should have "boy green" also, right?

    – Guru
    9 hours ago











  • yes "boy green" also, didn't see

    – John B
    9 hours ago






  • 1





    what about if you had a line like Hello world Hello? it should come in output? and how ? Hello world or world Hello?

    – αғsнιη
    9 hours ago












  • hello world , and this will count for 2 to the pair "hello world" always alphabetical

    – John B
    9 hours ago






  • 1





    how about HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications

    – αғsнιη
    8 hours ago








3




3





The output should have "boy green" also, right?

– Guru
9 hours ago





The output should have "boy green" also, right?

– Guru
9 hours ago













yes "boy green" also, didn't see

– John B
9 hours ago





yes "boy green" also, didn't see

– John B
9 hours ago




1




1





what about if you had a line like Hello world Hello? it should come in output? and how ? Hello world or world Hello?

– αғsнιη
9 hours ago






what about if you had a line like Hello world Hello? it should come in output? and how ? Hello world or world Hello?

– αғsнιη
9 hours ago














hello world , and this will count for 2 to the pair "hello world" always alphabetical

– John B
9 hours ago





hello world , and this will count for 2 to the pair "hello world" always alphabetical

– John B
9 hours ago




1




1





how about HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications

– αғsнιη
8 hours ago






how about HellonworldnHello? n is actual new-line character. please edit your question to answer for comments asking for clarifications

– αғsнιη
8 hours ago











2 Answers
2






active

oldest

votes


















0














Try this,



  1. Use awk to print each pair of words.

  2. Use perl to sort the words in a pair (via).

  3. Use sort and uniq -c to count occurrences each pair.


awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file 
| perl -ane '$,=" "; print sort @F; print "n";'
| sort | uniq -c | sort -b -k1nr -k2


Output:



 2 boy green
2 hello world
1 boy good
1 boy sun
1 good green
1 good world
1 green sun
1 sun world





share|improve this answer

























  • can not use perl or pipeline..

    – John B
    8 hours ago






  • 2





    Why can't you, @John? Those are standard utilities on most Linux systems.

    – Jeff Schaller
    8 hours ago






  • 1





    Yes, this kind of information should be in your question.

    – RoVo
    8 hours ago


















0














awk '

$0 = tolower($0)
for (i = 1; i < NF; i++)
pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
c = ++count[pair]
if (c > max) max = c


END
for (pair in count)
if (count[pair] == max)
print pair
'





share|improve this answer























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    John B is a new contributor. Be nice, and check out our Code of Conduct.









    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f513022%2ffind-a-pair-of-words-that-appear-the-most-of-the-times-together%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Try this,



    1. Use awk to print each pair of words.

    2. Use perl to sort the words in a pair (via).

    3. Use sort and uniq -c to count occurrences each pair.


    awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file 
    | perl -ane '$,=" "; print sort @F; print "n";'
    | sort | uniq -c | sort -b -k1nr -k2


    Output:



     2 boy green
    2 hello world
    1 boy good
    1 boy sun
    1 good green
    1 good world
    1 green sun
    1 sun world





    share|improve this answer

























    • can not use perl or pipeline..

      – John B
      8 hours ago






    • 2





      Why can't you, @John? Those are standard utilities on most Linux systems.

      – Jeff Schaller
      8 hours ago






    • 1





      Yes, this kind of information should be in your question.

      – RoVo
      8 hours ago















    0














    Try this,



    1. Use awk to print each pair of words.

    2. Use perl to sort the words in a pair (via).

    3. Use sort and uniq -c to count occurrences each pair.


    awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file 
    | perl -ane '$,=" "; print sort @F; print "n";'
    | sort | uniq -c | sort -b -k1nr -k2


    Output:



     2 boy green
    2 hello world
    1 boy good
    1 boy sun
    1 good green
    1 good world
    1 green sun
    1 sun world





    share|improve this answer

























    • can not use perl or pipeline..

      – John B
      8 hours ago






    • 2





      Why can't you, @John? Those are standard utilities on most Linux systems.

      – Jeff Schaller
      8 hours ago






    • 1





      Yes, this kind of information should be in your question.

      – RoVo
      8 hours ago













    0












    0








    0







    Try this,



    1. Use awk to print each pair of words.

    2. Use perl to sort the words in a pair (via).

    3. Use sort and uniq -c to count occurrences each pair.


    awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file 
    | perl -ane '$,=" "; print sort @F; print "n";'
    | sort | uniq -c | sort -b -k1nr -k2


    Output:



     2 boy green
    2 hello world
    1 boy good
    1 boy sun
    1 good green
    1 good world
    1 green sun
    1 sun world





    share|improve this answer















    Try this,



    1. Use awk to print each pair of words.

    2. Use perl to sort the words in a pair (via).

    3. Use sort and uniq -c to count occurrences each pair.


    awk 'for (i=1;i<NF;i++) print tolower($i)" "tolower($(i+1)) ' file 
    | perl -ane '$,=" "; print sort @F; print "n";'
    | sort | uniq -c | sort -b -k1nr -k2


    Output:



     2 boy green
    2 hello world
    1 boy good
    1 boy sun
    1 good green
    1 good world
    1 green sun
    1 sun world






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 8 hours ago

























    answered 8 hours ago









    RoVoRoVo

    3,960317




    3,960317












    • can not use perl or pipeline..

      – John B
      8 hours ago






    • 2





      Why can't you, @John? Those are standard utilities on most Linux systems.

      – Jeff Schaller
      8 hours ago






    • 1





      Yes, this kind of information should be in your question.

      – RoVo
      8 hours ago

















    • can not use perl or pipeline..

      – John B
      8 hours ago






    • 2





      Why can't you, @John? Those are standard utilities on most Linux systems.

      – Jeff Schaller
      8 hours ago






    • 1





      Yes, this kind of information should be in your question.

      – RoVo
      8 hours ago
















    can not use perl or pipeline..

    – John B
    8 hours ago





    can not use perl or pipeline..

    – John B
    8 hours ago




    2




    2





    Why can't you, @John? Those are standard utilities on most Linux systems.

    – Jeff Schaller
    8 hours ago





    Why can't you, @John? Those are standard utilities on most Linux systems.

    – Jeff Schaller
    8 hours ago




    1




    1





    Yes, this kind of information should be in your question.

    – RoVo
    8 hours ago





    Yes, this kind of information should be in your question.

    – RoVo
    8 hours ago













    0














    awk '

    $0 = tolower($0)
    for (i = 1; i < NF; i++)
    pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
    c = ++count[pair]
    if (c > max) max = c


    END
    for (pair in count)
    if (count[pair] == max)
    print pair
    '





    share|improve this answer



























      0














      awk '

      $0 = tolower($0)
      for (i = 1; i < NF; i++)
      pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
      c = ++count[pair]
      if (c > max) max = c


      END
      for (pair in count)
      if (count[pair] == max)
      print pair
      '





      share|improve this answer

























        0












        0








        0







        awk '

        $0 = tolower($0)
        for (i = 1; i < NF; i++)
        pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
        c = ++count[pair]
        if (c > max) max = c


        END
        for (pair in count)
        if (count[pair] == max)
        print pair
        '





        share|improve this answer













        awk '

        $0 = tolower($0)
        for (i = 1; i < NF; i++)
        pair = $i"" < $(i+1) ? $i" "$(i+1) : $(i+1)" "$i
        c = ++count[pair]
        if (c > max) max = c


        END
        for (pair in count)
        if (count[pair] == max)
        print pair
        '






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 7 hours ago









        Stéphane ChazelasStéphane Chazelas

        315k57597955




        315k57597955




















            John B is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            John B is a new contributor. Be nice, and check out our Code of Conduct.












            John B is a new contributor. Be nice, and check out our Code of Conduct.











            John B is a new contributor. Be nice, and check out our Code of Conduct.














            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f513022%2ffind-a-pair-of-words-that-appear-the-most-of-the-times-together%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            -awk, grep, linux, sed, text-processing

            Popular posts from this blog

            Creating 100m^2 grid automatically using QGIS?Creating grid constrained within polygon in QGIS?Createing polygon layer from point data using QGIS?Creating vector grid using QGIS?Creating grid polygons from coordinates using R or PythonCreating grid from spatio temporal point data?Creating fields in attributes table using other layers using QGISCreate .shp vector grid in QGISQGIS Creating 4km point grid within polygonsCreate a vector grid over a raster layerVector Grid Creates just one grid

            What is this called? Old film camera viewer?What makes a good film camera?What to do with an old film camera?What should one look for when buying a used film camera?What is the value and age of this pre-1967 Ricoh 35 mm camera?DSLR recommendation, question about old Canon 35mm film Camera & lensesCan anyone identify the silver rangefinder-style camera in this advertisement?What kind of a Polaroid 600-camera is this?Will an old film camera still work even when not used in a very long time?What is this camera / Can I develop the film?How to fit an action camera into antique (bellows) housing?What to check when buying used and old film bodies?

            Why is this plane circling around the Lucknow airport every day?Why do aircraft on Flight Radar 24 jump around randomly sometimes?What airport has this walkway over a taxiway?How does Chicago O'Hare's tower sequence aircraft at peak capacity?Which airport is featured in this Delta commercial?After a crash, for how long is the airport closed?Can a passenger plane stand still in the air, or hover at a fixed location above a ground?What are those trucks towing around, and why?What is this airport outside of Cairo, Egypt?Which US airport has the lowest circling MDH?What is this airport video?