Count lines containing word












6















I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:



0 hello world the man is world
1 this is the world
2 a different man is the possible one


The result I'm expecting is:



0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2


Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.










share|improve this question

























  • What have you try to the moment?

    – Romeo Ninov
    Jan 4 at 15:28











  • This seems highly relevant: unix.stackexchange.com/a/332890/224077

    – Panki
    Jan 4 at 15:41
















6















I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:



0 hello world the man is world
1 this is the world
2 a different man is the possible one


The result I'm expecting is:



0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2


Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.










share|improve this question

























  • What have you try to the moment?

    – Romeo Ninov
    Jan 4 at 15:28











  • This seems highly relevant: unix.stackexchange.com/a/332890/224077

    – Panki
    Jan 4 at 15:41














6












6








6








I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:



0 hello world the man is world
1 this is the world
2 a different man is the possible one


The result I'm expecting is:



0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2


Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.










share|improve this question
















I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:



0 hello world the man is world
1 this is the world
2 a different man is the possible one


The result I'm expecting is:



0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2


Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.







text-processing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 4 at 18:33









Jeff Schaller

43.4k1160140




43.4k1160140










asked Jan 4 at 15:16









NetzsoocNetzsooc

586




586













  • What have you try to the moment?

    – Romeo Ninov
    Jan 4 at 15:28











  • This seems highly relevant: unix.stackexchange.com/a/332890/224077

    – Panki
    Jan 4 at 15:41



















  • What have you try to the moment?

    – Romeo Ninov
    Jan 4 at 15:28











  • This seems highly relevant: unix.stackexchange.com/a/332890/224077

    – Panki
    Jan 4 at 15:41

















What have you try to the moment?

– Romeo Ninov
Jan 4 at 15:28





What have you try to the moment?

– Romeo Ninov
Jan 4 at 15:28













This seems highly relevant: unix.stackexchange.com/a/332890/224077

– Panki
Jan 4 at 15:41





This seems highly relevant: unix.stackexchange.com/a/332890/224077

– Panki
Jan 4 at 15:41










8 Answers
8






active

oldest

votes


















5














Another Perl variant, using List::Util



$ perl -MList::Util=uniq -alne '
map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
' file
0: 1
1: 1
2: 1
a: 1
different: 1
hello: 1
is: 3
man: 2
one: 1
possible: 1
the: 3
this: 1
world: 2





share|improve this answer































    5














    Straightfoward-ish in bash:



    declare -A wordcount
    while read -ra words; do
    # unique words on this line
    declare -A uniq
    for word in "${words[@]}"; do
    uniq[$word]=1
    done
    # accumulate the words
    for word in "${!uniq[@]}"; do
    ((wordcount[$word]++))
    done
    unset uniq
    done < file


    Looking at the data:



    $ declare -p wordcount
    declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


    and formatting as you want:



    $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
    0:1
    1:1
    2:1
    a:1
    different:1
    hello:1
    is:3
    man:2
    one:1
    possible:1
    the:3
    this:1
    world:2





    share|improve this answer































      4














      It's a pretty straight-forward perl script:



      #!/usr/bin/perl -w
      use strict;

      my %words = ();
      while (<>) {
      chomp;
      my %linewords = ();
      map { $linewords{$_}=1 } split / /;
      foreach my $word (keys %linewords) {
      $words{$word}++;
      }
      }

      foreach my $word (sort keys %words) {
      print "$word:$words{$word}n";
      }


      The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.






      share|improve this answer



















      • 1





        A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.

        – Larry
        Jan 4 at 16:38



















      2














      A solution that calls several programs from a shell:



      fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



      A little explanation:



      The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.



      In order to count the occurences of a word in a file, one can use grep (a tool meant to search files for patterns). By passing the -cw option, grep gives the number of word matches it finds. So you can find the total number of occurrences of pattern using grep -cw pattern words.txt.



      The tool xargs allows us to do this for each and every single word output by sort. The -Ipattern means that it will execute the following command multiple times, replacing each occurrence of pattern with a word it reads from standard input, which is what it gets from sort.



      The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution. The $(...) is command substitution in the above snippet, as it substitutes the output from grep into echo, allowing it to be formatted correctly. Since we need the command substitution, we must use the sh -c command which runs whatever it recieves as an argument in its own shell.






      share|improve this answer


























      • An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'

        – matja
        Jan 5 at 0:14













      • @matja is sort | uniq -c more efficient than sort -u?

        – vikarjramun
        Jan 5 at 3:31











      • vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.

        – matja
        Jan 5 at 10:11






      • 1





        @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.

        – Larry
        Jan 5 at 10:17





















      2














      Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.



      from collections import Counter

      with open("words.txt") as f:
      c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))
      for word, occurrence in sorted(c.items()):
      print(f'{word}:{occurrence}')
      # for Python 2.7.x compatibility you can replace the above line with
      # the following one:
      # print('{}:{}'.format(word, occurrence))


      A more explicit version version of the above:



      from collections import Counter


      FILENAME = "words.txt"


      def find_unique_words():
      with open(FILENAME) as f:
      lines = [line.strip().split() for line in f]

      unique_words = Counter(word for line in lines for word in set(line))
      return sorted(unique_words.items())


      def print_unique_words():
      unique_words = find_unique_words()
      for word, occurrence in unique_words:
      print(f'{word}:{occurrence}')


      def main():
      print_unique_words()


      if __name__ == '__main__':
      main()


      Output:



      0:1
      1:1
      2:1
      a:1
      different:1
      hello:1
      is:3
      man:2
      one:1
      possible:1
      the:3
      this:1
      world:2


      The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.






      share|improve this answer

































        0














        Trying to do it with awk:



        count.awk:



        #!/usr/bin/awk -f
        # count line containing word

        {
        for (i = 1 ; i <= NF ; i++) {
        word_in_a_line[$i] ++
        if (word_in_a_line[$i] == 1) {
        word_line_count[$i] ++
        }
        }

        delete word_in_a_line
        }

        END {
        for (word in word_line_count){
        printf "%s:%dn",word,word_line_count[word]
        }
        }


        Run it by:



        $ awk -f count.awk ./test.data | sort





        share|improve this answer































          0














          A pure bash answer



          echo "0 hello world the man is world
          1 this is the world
          2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c


          1 0
          1 1
          1 2
          1 a
          1 different
          1 hello
          3 is
          2 man
          1 one
          1 possible
          3 the
          1 this
          2 world


          I looped unique words on each line and passed it to uniq -c



          edit: I did not see glenn's answer. I found it strange to not see a bash answer






          share|improve this answer

































            0














            Simple, though doesn't care if it reads the file many times:



            sed 's/ /n/g' file.txt | sort | uniq | while read -r word; do
            printf "%s:%dn" "$word" "$(grep -Fw "$word" file.txt | wc -l)"
            done


            EDIT: Despite converting spaces to newlines, this does count lines that have an occurrence of each word and not the occurrences of the words themselves. It gives the result:



            0:1
            1:1
            2:1
            a:1
            different:1
            hello:1
            is:3
            man:2
            one:1
            possible:1
            the:3
            this:1
            world:2


            which is character-by-character identical to OP's example result.






            share|improve this answer





















            • 1





              Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.

              – Sparhawk
              Jan 5 at 9:59











            • @Sparhawk Read the answer again. This does give the answer he gave as example, including giving the result of 2 instead of 3 for world. He meant that doing something like sed 's/ /n/g' | sort | uniq -c would not work because it'd give the answer 3 for world, but that's not what this answer does. It correctly counts the lines where the words occur and not the occurrences themselves, just like OP wanted.

              – JoL
              Jan 6 at 7:03













            • Ah right, apologies! I would recommend putting in an explanation of your code, which is both helpful to the questioner, and clarifies what it does. Also, as a minor point, you probably want read -r here.

              – Sparhawk
              Jan 6 at 9:38











            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "106"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492501%2fcount-lines-containing-word%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            8 Answers
            8






            active

            oldest

            votes








            8 Answers
            8






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            5














            Another Perl variant, using List::Util



            $ perl -MList::Util=uniq -alne '
            map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
            ' file
            0: 1
            1: 1
            2: 1
            a: 1
            different: 1
            hello: 1
            is: 3
            man: 2
            one: 1
            possible: 1
            the: 3
            this: 1
            world: 2





            share|improve this answer




























              5














              Another Perl variant, using List::Util



              $ perl -MList::Util=uniq -alne '
              map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
              ' file
              0: 1
              1: 1
              2: 1
              a: 1
              different: 1
              hello: 1
              is: 3
              man: 2
              one: 1
              possible: 1
              the: 3
              this: 1
              world: 2





              share|improve this answer


























                5












                5








                5







                Another Perl variant, using List::Util



                $ perl -MList::Util=uniq -alne '
                map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
                ' file
                0: 1
                1: 1
                2: 1
                a: 1
                different: 1
                hello: 1
                is: 3
                man: 2
                one: 1
                possible: 1
                the: 3
                this: 1
                world: 2





                share|improve this answer













                Another Perl variant, using List::Util



                $ perl -MList::Util=uniq -alne '
                map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
                ' file
                0: 1
                1: 1
                2: 1
                a: 1
                different: 1
                hello: 1
                is: 3
                man: 2
                one: 1
                possible: 1
                the: 3
                this: 1
                world: 2






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jan 4 at 16:11









                steeldriversteeldriver

                37k45287




                37k45287

























                    5














                    Straightfoward-ish in bash:



                    declare -A wordcount
                    while read -ra words; do
                    # unique words on this line
                    declare -A uniq
                    for word in "${words[@]}"; do
                    uniq[$word]=1
                    done
                    # accumulate the words
                    for word in "${!uniq[@]}"; do
                    ((wordcount[$word]++))
                    done
                    unset uniq
                    done < file


                    Looking at the data:



                    $ declare -p wordcount
                    declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


                    and formatting as you want:



                    $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
                    0:1
                    1:1
                    2:1
                    a:1
                    different:1
                    hello:1
                    is:3
                    man:2
                    one:1
                    possible:1
                    the:3
                    this:1
                    world:2





                    share|improve this answer




























                      5














                      Straightfoward-ish in bash:



                      declare -A wordcount
                      while read -ra words; do
                      # unique words on this line
                      declare -A uniq
                      for word in "${words[@]}"; do
                      uniq[$word]=1
                      done
                      # accumulate the words
                      for word in "${!uniq[@]}"; do
                      ((wordcount[$word]++))
                      done
                      unset uniq
                      done < file


                      Looking at the data:



                      $ declare -p wordcount
                      declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


                      and formatting as you want:



                      $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
                      0:1
                      1:1
                      2:1
                      a:1
                      different:1
                      hello:1
                      is:3
                      man:2
                      one:1
                      possible:1
                      the:3
                      this:1
                      world:2





                      share|improve this answer


























                        5












                        5








                        5







                        Straightfoward-ish in bash:



                        declare -A wordcount
                        while read -ra words; do
                        # unique words on this line
                        declare -A uniq
                        for word in "${words[@]}"; do
                        uniq[$word]=1
                        done
                        # accumulate the words
                        for word in "${!uniq[@]}"; do
                        ((wordcount[$word]++))
                        done
                        unset uniq
                        done < file


                        Looking at the data:



                        $ declare -p wordcount
                        declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


                        and formatting as you want:



                        $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
                        0:1
                        1:1
                        2:1
                        a:1
                        different:1
                        hello:1
                        is:3
                        man:2
                        one:1
                        possible:1
                        the:3
                        this:1
                        world:2





                        share|improve this answer













                        Straightfoward-ish in bash:



                        declare -A wordcount
                        while read -ra words; do
                        # unique words on this line
                        declare -A uniq
                        for word in "${words[@]}"; do
                        uniq[$word]=1
                        done
                        # accumulate the words
                        for word in "${!uniq[@]}"; do
                        ((wordcount[$word]++))
                        done
                        unset uniq
                        done < file


                        Looking at the data:



                        $ declare -p wordcount
                        declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


                        and formatting as you want:



                        $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
                        0:1
                        1:1
                        2:1
                        a:1
                        different:1
                        hello:1
                        is:3
                        man:2
                        one:1
                        possible:1
                        the:3
                        this:1
                        world:2






                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Jan 4 at 16:42









                        glenn jackmanglenn jackman

                        52.3k572113




                        52.3k572113























                            4














                            It's a pretty straight-forward perl script:



                            #!/usr/bin/perl -w
                            use strict;

                            my %words = ();
                            while (<>) {
                            chomp;
                            my %linewords = ();
                            map { $linewords{$_}=1 } split / /;
                            foreach my $word (keys %linewords) {
                            $words{$word}++;
                            }
                            }

                            foreach my $word (sort keys %words) {
                            print "$word:$words{$word}n";
                            }


                            The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.






                            share|improve this answer



















                            • 1





                              A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.

                              – Larry
                              Jan 4 at 16:38
















                            4














                            It's a pretty straight-forward perl script:



                            #!/usr/bin/perl -w
                            use strict;

                            my %words = ();
                            while (<>) {
                            chomp;
                            my %linewords = ();
                            map { $linewords{$_}=1 } split / /;
                            foreach my $word (keys %linewords) {
                            $words{$word}++;
                            }
                            }

                            foreach my $word (sort keys %words) {
                            print "$word:$words{$word}n";
                            }


                            The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.






                            share|improve this answer



















                            • 1





                              A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.

                              – Larry
                              Jan 4 at 16:38














                            4












                            4








                            4







                            It's a pretty straight-forward perl script:



                            #!/usr/bin/perl -w
                            use strict;

                            my %words = ();
                            while (<>) {
                            chomp;
                            my %linewords = ();
                            map { $linewords{$_}=1 } split / /;
                            foreach my $word (keys %linewords) {
                            $words{$word}++;
                            }
                            }

                            foreach my $word (sort keys %words) {
                            print "$word:$words{$word}n";
                            }


                            The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.






                            share|improve this answer













                            It's a pretty straight-forward perl script:



                            #!/usr/bin/perl -w
                            use strict;

                            my %words = ();
                            while (<>) {
                            chomp;
                            my %linewords = ();
                            map { $linewords{$_}=1 } split / /;
                            foreach my $word (keys %linewords) {
                            $words{$word}++;
                            }
                            }

                            foreach my $word (sort keys %words) {
                            print "$word:$words{$word}n";
                            }


                            The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Jan 4 at 15:59









                            Jeff SchallerJeff Schaller

                            43.4k1160140




                            43.4k1160140








                            • 1





                              A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.

                              – Larry
                              Jan 4 at 16:38














                            • 1





                              A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.

                              – Larry
                              Jan 4 at 16:38








                            1




                            1





                            A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.

                            – Larry
                            Jan 4 at 16:38





                            A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.

                            – Larry
                            Jan 4 at 16:38











                            2














                            A solution that calls several programs from a shell:



                            fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



                            A little explanation:



                            The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.



                            In order to count the occurences of a word in a file, one can use grep (a tool meant to search files for patterns). By passing the -cw option, grep gives the number of word matches it finds. So you can find the total number of occurrences of pattern using grep -cw pattern words.txt.



                            The tool xargs allows us to do this for each and every single word output by sort. The -Ipattern means that it will execute the following command multiple times, replacing each occurrence of pattern with a word it reads from standard input, which is what it gets from sort.



                            The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution. The $(...) is command substitution in the above snippet, as it substitutes the output from grep into echo, allowing it to be formatted correctly. Since we need the command substitution, we must use the sh -c command which runs whatever it recieves as an argument in its own shell.






                            share|improve this answer


























                            • An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'

                              – matja
                              Jan 5 at 0:14













                            • @matja is sort | uniq -c more efficient than sort -u?

                              – vikarjramun
                              Jan 5 at 3:31











                            • vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.

                              – matja
                              Jan 5 at 10:11






                            • 1





                              @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.

                              – Larry
                              Jan 5 at 10:17


















                            2














                            A solution that calls several programs from a shell:



                            fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



                            A little explanation:



                            The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.



                            In order to count the occurences of a word in a file, one can use grep (a tool meant to search files for patterns). By passing the -cw option, grep gives the number of word matches it finds. So you can find the total number of occurrences of pattern using grep -cw pattern words.txt.



                            The tool xargs allows us to do this for each and every single word output by sort. The -Ipattern means that it will execute the following command multiple times, replacing each occurrence of pattern with a word it reads from standard input, which is what it gets from sort.



                            The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution. The $(...) is command substitution in the above snippet, as it substitutes the output from grep into echo, allowing it to be formatted correctly. Since we need the command substitution, we must use the sh -c command which runs whatever it recieves as an argument in its own shell.






                            share|improve this answer


























                            • An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'

                              – matja
                              Jan 5 at 0:14













                            • @matja is sort | uniq -c more efficient than sort -u?

                              – vikarjramun
                              Jan 5 at 3:31











                            • vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.

                              – matja
                              Jan 5 at 10:11






                            • 1





                              @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.

                              – Larry
                              Jan 5 at 10:17
















                            2












                            2








                            2







                            A solution that calls several programs from a shell:



                            fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



                            A little explanation:



                            The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.



                            In order to count the occurences of a word in a file, one can use grep (a tool meant to search files for patterns). By passing the -cw option, grep gives the number of word matches it finds. So you can find the total number of occurrences of pattern using grep -cw pattern words.txt.



                            The tool xargs allows us to do this for each and every single word output by sort. The -Ipattern means that it will execute the following command multiple times, replacing each occurrence of pattern with a word it reads from standard input, which is what it gets from sort.



                            The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution. The $(...) is command substitution in the above snippet, as it substitutes the output from grep into echo, allowing it to be formatted correctly. Since we need the command substitution, we must use the sh -c command which runs whatever it recieves as an argument in its own shell.






                            share|improve this answer















                            A solution that calls several programs from a shell:



                            fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



                            A little explanation:



                            The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.



                            In order to count the occurences of a word in a file, one can use grep (a tool meant to search files for patterns). By passing the -cw option, grep gives the number of word matches it finds. So you can find the total number of occurrences of pattern using grep -cw pattern words.txt.



                            The tool xargs allows us to do this for each and every single word output by sort. The -Ipattern means that it will execute the following command multiple times, replacing each occurrence of pattern with a word it reads from standard input, which is what it gets from sort.



                            The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution. The $(...) is command substitution in the above snippet, as it substitutes the output from grep into echo, allowing it to be formatted correctly. Since we need the command substitution, we must use the sh -c command which runs whatever it recieves as an argument in its own shell.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Jan 4 at 21:13









                            vikarjramun

                            1428




                            1428










                            answered Jan 4 at 17:33









                            LarryLarry

                            1265




                            1265













                            • An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'

                              – matja
                              Jan 5 at 0:14













                            • @matja is sort | uniq -c more efficient than sort -u?

                              – vikarjramun
                              Jan 5 at 3:31











                            • vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.

                              – matja
                              Jan 5 at 10:11






                            • 1





                              @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.

                              – Larry
                              Jan 5 at 10:17





















                            • An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'

                              – matja
                              Jan 5 at 0:14













                            • @matja is sort | uniq -c more efficient than sort -u?

                              – vikarjramun
                              Jan 5 at 3:31











                            • vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.

                              – matja
                              Jan 5 at 10:11






                            • 1





                              @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.

                              – Larry
                              Jan 5 at 10:17



















                            An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'

                            – matja
                            Jan 5 at 0:14







                            An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'

                            – matja
                            Jan 5 at 0:14















                            @matja is sort | uniq -c more efficient than sort -u?

                            – vikarjramun
                            Jan 5 at 3:31





                            @matja is sort | uniq -c more efficient than sort -u?

                            – vikarjramun
                            Jan 5 at 3:31













                            vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.

                            – matja
                            Jan 5 at 10:11





                            vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.

                            – matja
                            Jan 5 at 10:11




                            1




                            1





                            @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.

                            – Larry
                            Jan 5 at 10:17







                            @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.

                            – Larry
                            Jan 5 at 10:17













                            2














                            Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.



                            from collections import Counter

                            with open("words.txt") as f:
                            c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))
                            for word, occurrence in sorted(c.items()):
                            print(f'{word}:{occurrence}')
                            # for Python 2.7.x compatibility you can replace the above line with
                            # the following one:
                            # print('{}:{}'.format(word, occurrence))


                            A more explicit version version of the above:



                            from collections import Counter


                            FILENAME = "words.txt"


                            def find_unique_words():
                            with open(FILENAME) as f:
                            lines = [line.strip().split() for line in f]

                            unique_words = Counter(word for line in lines for word in set(line))
                            return sorted(unique_words.items())


                            def print_unique_words():
                            unique_words = find_unique_words()
                            for word, occurrence in unique_words:
                            print(f'{word}:{occurrence}')


                            def main():
                            print_unique_words()


                            if __name__ == '__main__':
                            main()


                            Output:



                            0:1
                            1:1
                            2:1
                            a:1
                            different:1
                            hello:1
                            is:3
                            man:2
                            one:1
                            possible:1
                            the:3
                            this:1
                            world:2


                            The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.






                            share|improve this answer






























                              2














                              Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.



                              from collections import Counter

                              with open("words.txt") as f:
                              c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))
                              for word, occurrence in sorted(c.items()):
                              print(f'{word}:{occurrence}')
                              # for Python 2.7.x compatibility you can replace the above line with
                              # the following one:
                              # print('{}:{}'.format(word, occurrence))


                              A more explicit version version of the above:



                              from collections import Counter


                              FILENAME = "words.txt"


                              def find_unique_words():
                              with open(FILENAME) as f:
                              lines = [line.strip().split() for line in f]

                              unique_words = Counter(word for line in lines for word in set(line))
                              return sorted(unique_words.items())


                              def print_unique_words():
                              unique_words = find_unique_words()
                              for word, occurrence in unique_words:
                              print(f'{word}:{occurrence}')


                              def main():
                              print_unique_words()


                              if __name__ == '__main__':
                              main()


                              Output:



                              0:1
                              1:1
                              2:1
                              a:1
                              different:1
                              hello:1
                              is:3
                              man:2
                              one:1
                              possible:1
                              the:3
                              this:1
                              world:2


                              The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.






                              share|improve this answer




























                                2












                                2








                                2







                                Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.



                                from collections import Counter

                                with open("words.txt") as f:
                                c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))
                                for word, occurrence in sorted(c.items()):
                                print(f'{word}:{occurrence}')
                                # for Python 2.7.x compatibility you can replace the above line with
                                # the following one:
                                # print('{}:{}'.format(word, occurrence))


                                A more explicit version version of the above:



                                from collections import Counter


                                FILENAME = "words.txt"


                                def find_unique_words():
                                with open(FILENAME) as f:
                                lines = [line.strip().split() for line in f]

                                unique_words = Counter(word for line in lines for word in set(line))
                                return sorted(unique_words.items())


                                def print_unique_words():
                                unique_words = find_unique_words()
                                for word, occurrence in unique_words:
                                print(f'{word}:{occurrence}')


                                def main():
                                print_unique_words()


                                if __name__ == '__main__':
                                main()


                                Output:



                                0:1
                                1:1
                                2:1
                                a:1
                                different:1
                                hello:1
                                is:3
                                man:2
                                one:1
                                possible:1
                                the:3
                                this:1
                                world:2


                                The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.






                                share|improve this answer















                                Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.



                                from collections import Counter

                                with open("words.txt") as f:
                                c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))
                                for word, occurrence in sorted(c.items()):
                                print(f'{word}:{occurrence}')
                                # for Python 2.7.x compatibility you can replace the above line with
                                # the following one:
                                # print('{}:{}'.format(word, occurrence))


                                A more explicit version version of the above:



                                from collections import Counter


                                FILENAME = "words.txt"


                                def find_unique_words():
                                with open(FILENAME) as f:
                                lines = [line.strip().split() for line in f]

                                unique_words = Counter(word for line in lines for word in set(line))
                                return sorted(unique_words.items())


                                def print_unique_words():
                                unique_words = find_unique_words()
                                for word, occurrence in unique_words:
                                print(f'{word}:{occurrence}')


                                def main():
                                print_unique_words()


                                if __name__ == '__main__':
                                main()


                                Output:



                                0:1
                                1:1
                                2:1
                                a:1
                                different:1
                                hello:1
                                is:3
                                man:2
                                one:1
                                possible:1
                                the:3
                                this:1
                                world:2


                                The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.







                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited Jan 5 at 12:37









                                David Foerster

                                1,019717




                                1,019717










                                answered Jan 4 at 20:57









                                яүυкяүυк

                                1247




                                1247























                                    0














                                    Trying to do it with awk:



                                    count.awk:



                                    #!/usr/bin/awk -f
                                    # count line containing word

                                    {
                                    for (i = 1 ; i <= NF ; i++) {
                                    word_in_a_line[$i] ++
                                    if (word_in_a_line[$i] == 1) {
                                    word_line_count[$i] ++
                                    }
                                    }

                                    delete word_in_a_line
                                    }

                                    END {
                                    for (word in word_line_count){
                                    printf "%s:%dn",word,word_line_count[word]
                                    }
                                    }


                                    Run it by:



                                    $ awk -f count.awk ./test.data | sort





                                    share|improve this answer




























                                      0














                                      Trying to do it with awk:



                                      count.awk:



                                      #!/usr/bin/awk -f
                                      # count line containing word

                                      {
                                      for (i = 1 ; i <= NF ; i++) {
                                      word_in_a_line[$i] ++
                                      if (word_in_a_line[$i] == 1) {
                                      word_line_count[$i] ++
                                      }
                                      }

                                      delete word_in_a_line
                                      }

                                      END {
                                      for (word in word_line_count){
                                      printf "%s:%dn",word,word_line_count[word]
                                      }
                                      }


                                      Run it by:



                                      $ awk -f count.awk ./test.data | sort





                                      share|improve this answer


























                                        0












                                        0








                                        0







                                        Trying to do it with awk:



                                        count.awk:



                                        #!/usr/bin/awk -f
                                        # count line containing word

                                        {
                                        for (i = 1 ; i <= NF ; i++) {
                                        word_in_a_line[$i] ++
                                        if (word_in_a_line[$i] == 1) {
                                        word_line_count[$i] ++
                                        }
                                        }

                                        delete word_in_a_line
                                        }

                                        END {
                                        for (word in word_line_count){
                                        printf "%s:%dn",word,word_line_count[word]
                                        }
                                        }


                                        Run it by:



                                        $ awk -f count.awk ./test.data | sort





                                        share|improve this answer













                                        Trying to do it with awk:



                                        count.awk:



                                        #!/usr/bin/awk -f
                                        # count line containing word

                                        {
                                        for (i = 1 ; i <= NF ; i++) {
                                        word_in_a_line[$i] ++
                                        if (word_in_a_line[$i] == 1) {
                                        word_line_count[$i] ++
                                        }
                                        }

                                        delete word_in_a_line
                                        }

                                        END {
                                        for (word in word_line_count){
                                        printf "%s:%dn",word,word_line_count[word]
                                        }
                                        }


                                        Run it by:



                                        $ awk -f count.awk ./test.data | sort






                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered Jan 6 at 1:26









                                        CharlesCharles

                                        32818




                                        32818























                                            0














                                            A pure bash answer



                                            echo "0 hello world the man is world
                                            1 this is the world
                                            2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c


                                            1 0
                                            1 1
                                            1 2
                                            1 a
                                            1 different
                                            1 hello
                                            3 is
                                            2 man
                                            1 one
                                            1 possible
                                            3 the
                                            1 this
                                            2 world


                                            I looped unique words on each line and passed it to uniq -c



                                            edit: I did not see glenn's answer. I found it strange to not see a bash answer






                                            share|improve this answer






























                                              0














                                              A pure bash answer



                                              echo "0 hello world the man is world
                                              1 this is the world
                                              2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c


                                              1 0
                                              1 1
                                              1 2
                                              1 a
                                              1 different
                                              1 hello
                                              3 is
                                              2 man
                                              1 one
                                              1 possible
                                              3 the
                                              1 this
                                              2 world


                                              I looped unique words on each line and passed it to uniq -c



                                              edit: I did not see glenn's answer. I found it strange to not see a bash answer






                                              share|improve this answer




























                                                0












                                                0








                                                0







                                                A pure bash answer



                                                echo "0 hello world the man is world
                                                1 this is the world
                                                2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c


                                                1 0
                                                1 1
                                                1 2
                                                1 a
                                                1 different
                                                1 hello
                                                3 is
                                                2 man
                                                1 one
                                                1 possible
                                                3 the
                                                1 this
                                                2 world


                                                I looped unique words on each line and passed it to uniq -c



                                                edit: I did not see glenn's answer. I found it strange to not see a bash answer






                                                share|improve this answer















                                                A pure bash answer



                                                echo "0 hello world the man is world
                                                1 this is the world
                                                2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c


                                                1 0
                                                1 1
                                                1 2
                                                1 a
                                                1 different
                                                1 hello
                                                3 is
                                                2 man
                                                1 one
                                                1 possible
                                                3 the
                                                1 this
                                                2 world


                                                I looped unique words on each line and passed it to uniq -c



                                                edit: I did not see glenn's answer. I found it strange to not see a bash answer







                                                share|improve this answer














                                                share|improve this answer



                                                share|improve this answer








                                                edited Jan 6 at 4:57

























                                                answered Jan 6 at 4:48









                                                user1462442user1462442

                                                1214




                                                1214























                                                    0














                                                    Simple, though doesn't care if it reads the file many times:



                                                    sed 's/ /n/g' file.txt | sort | uniq | while read -r word; do
                                                    printf "%s:%dn" "$word" "$(grep -Fw "$word" file.txt | wc -l)"
                                                    done


                                                    EDIT: Despite converting spaces to newlines, this does count lines that have an occurrence of each word and not the occurrences of the words themselves. It gives the result:



                                                    0:1
                                                    1:1
                                                    2:1
                                                    a:1
                                                    different:1
                                                    hello:1
                                                    is:3
                                                    man:2
                                                    one:1
                                                    possible:1
                                                    the:3
                                                    this:1
                                                    world:2


                                                    which is character-by-character identical to OP's example result.






                                                    share|improve this answer





















                                                    • 1





                                                      Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.

                                                      – Sparhawk
                                                      Jan 5 at 9:59











                                                    • @Sparhawk Read the answer again. This does give the answer he gave as example, including giving the result of 2 instead of 3 for world. He meant that doing something like sed 's/ /n/g' | sort | uniq -c would not work because it'd give the answer 3 for world, but that's not what this answer does. It correctly counts the lines where the words occur and not the occurrences themselves, just like OP wanted.

                                                      – JoL
                                                      Jan 6 at 7:03













                                                    • Ah right, apologies! I would recommend putting in an explanation of your code, which is both helpful to the questioner, and clarifies what it does. Also, as a minor point, you probably want read -r here.

                                                      – Sparhawk
                                                      Jan 6 at 9:38
















                                                    0














                                                    Simple, though doesn't care if it reads the file many times:



                                                    sed 's/ /n/g' file.txt | sort | uniq | while read -r word; do
                                                    printf "%s:%dn" "$word" "$(grep -Fw "$word" file.txt | wc -l)"
                                                    done


                                                    EDIT: Despite converting spaces to newlines, this does count lines that have an occurrence of each word and not the occurrences of the words themselves. It gives the result:



                                                    0:1
                                                    1:1
                                                    2:1
                                                    a:1
                                                    different:1
                                                    hello:1
                                                    is:3
                                                    man:2
                                                    one:1
                                                    possible:1
                                                    the:3
                                                    this:1
                                                    world:2


                                                    which is character-by-character identical to OP's example result.






                                                    share|improve this answer





















                                                    • 1





                                                      Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.

                                                      – Sparhawk
                                                      Jan 5 at 9:59











                                                    • @Sparhawk Read the answer again. This does give the answer he gave as example, including giving the result of 2 instead of 3 for world. He meant that doing something like sed 's/ /n/g' | sort | uniq -c would not work because it'd give the answer 3 for world, but that's not what this answer does. It correctly counts the lines where the words occur and not the occurrences themselves, just like OP wanted.

                                                      – JoL
                                                      Jan 6 at 7:03













                                                    • Ah right, apologies! I would recommend putting in an explanation of your code, which is both helpful to the questioner, and clarifies what it does. Also, as a minor point, you probably want read -r here.

                                                      – Sparhawk
                                                      Jan 6 at 9:38














                                                    0












                                                    0








                                                    0







                                                    Simple, though doesn't care if it reads the file many times:



                                                    sed 's/ /n/g' file.txt | sort | uniq | while read -r word; do
                                                    printf "%s:%dn" "$word" "$(grep -Fw "$word" file.txt | wc -l)"
                                                    done


                                                    EDIT: Despite converting spaces to newlines, this does count lines that have an occurrence of each word and not the occurrences of the words themselves. It gives the result:



                                                    0:1
                                                    1:1
                                                    2:1
                                                    a:1
                                                    different:1
                                                    hello:1
                                                    is:3
                                                    man:2
                                                    one:1
                                                    possible:1
                                                    the:3
                                                    this:1
                                                    world:2


                                                    which is character-by-character identical to OP's example result.






                                                    share|improve this answer















                                                    Simple, though doesn't care if it reads the file many times:



                                                    sed 's/ /n/g' file.txt | sort | uniq | while read -r word; do
                                                    printf "%s:%dn" "$word" "$(grep -Fw "$word" file.txt | wc -l)"
                                                    done


                                                    EDIT: Despite converting spaces to newlines, this does count lines that have an occurrence of each word and not the occurrences of the words themselves. It gives the result:



                                                    0:1
                                                    1:1
                                                    2:1
                                                    a:1
                                                    different:1
                                                    hello:1
                                                    is:3
                                                    man:2
                                                    one:1
                                                    possible:1
                                                    the:3
                                                    this:1
                                                    world:2


                                                    which is character-by-character identical to OP's example result.







                                                    share|improve this answer














                                                    share|improve this answer



                                                    share|improve this answer








                                                    edited Jan 7 at 16:54

























                                                    answered Jan 5 at 2:03









                                                    JoLJoL

                                                    1,146311




                                                    1,146311








                                                    • 1





                                                      Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.

                                                      – Sparhawk
                                                      Jan 5 at 9:59











                                                    • @Sparhawk Read the answer again. This does give the answer he gave as example, including giving the result of 2 instead of 3 for world. He meant that doing something like sed 's/ /n/g' | sort | uniq -c would not work because it'd give the answer 3 for world, but that's not what this answer does. It correctly counts the lines where the words occur and not the occurrences themselves, just like OP wanted.

                                                      – JoL
                                                      Jan 6 at 7:03













                                                    • Ah right, apologies! I would recommend putting in an explanation of your code, which is both helpful to the questioner, and clarifies what it does. Also, as a minor point, you probably want read -r here.

                                                      – Sparhawk
                                                      Jan 6 at 9:38














                                                    • 1





                                                      Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.

                                                      – Sparhawk
                                                      Jan 5 at 9:59











                                                    • @Sparhawk Read the answer again. This does give the answer he gave as example, including giving the result of 2 instead of 3 for world. He meant that doing something like sed 's/ /n/g' | sort | uniq -c would not work because it'd give the answer 3 for world, but that's not what this answer does. It correctly counts the lines where the words occur and not the occurrences themselves, just like OP wanted.

                                                      – JoL
                                                      Jan 6 at 7:03













                                                    • Ah right, apologies! I would recommend putting in an explanation of your code, which is both helpful to the questioner, and clarifies what it does. Also, as a minor point, you probably want read -r here.

                                                      – Sparhawk
                                                      Jan 6 at 9:38








                                                    1




                                                    1





                                                    Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.

                                                    – Sparhawk
                                                    Jan 5 at 9:59





                                                    Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.

                                                    – Sparhawk
                                                    Jan 5 at 9:59













                                                    @Sparhawk Read the answer again. This does give the answer he gave as example, including giving the result of 2 instead of 3 for world. He meant that doing something like sed 's/ /n/g' | sort | uniq -c would not work because it'd give the answer 3 for world, but that's not what this answer does. It correctly counts the lines where the words occur and not the occurrences themselves, just like OP wanted.

                                                    – JoL
                                                    Jan 6 at 7:03







                                                    @Sparhawk Read the answer again. This does give the answer he gave as example, including giving the result of 2 instead of 3 for world. He meant that doing something like sed 's/ /n/g' | sort | uniq -c would not work because it'd give the answer 3 for world, but that's not what this answer does. It correctly counts the lines where the words occur and not the occurrences themselves, just like OP wanted.

                                                    – JoL
                                                    Jan 6 at 7:03















                                                    Ah right, apologies! I would recommend putting in an explanation of your code, which is both helpful to the questioner, and clarifies what it does. Also, as a minor point, you probably want read -r here.

                                                    – Sparhawk
                                                    Jan 6 at 9:38





                                                    Ah right, apologies! I would recommend putting in an explanation of your code, which is both helpful to the questioner, and clarifies what it does. Also, as a minor point, you probably want read -r here.

                                                    – Sparhawk
                                                    Jan 6 at 9:38


















                                                    draft saved

                                                    draft discarded




















































                                                    Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function () {
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492501%2fcount-lines-containing-word%23new-answer', 'question_page');
                                                    }
                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    Bressuire

                                                    Cabo Verde

                                                    Gyllenstierna