Find all occurrences of string after another string in all filesHow to find the total number of occurrences of text and files with find commandParsing string by awk and get only elements without pipes or semi colonsGetting the combined count of all occurrences of a string under multiple directories?How do i Reverse output already piped from sort and cut commandsHow to append string with dash?How do I search a directory of data files and print out how many times a certain word appears in each file?Standard deviation of number next to a patternDuplicate a file multiple times, write to duplicated files, sort the files, count position of specific lines after sortingCount the number of occurrences of a substring in a stringcounting the matched lines

Why are 150k or 200k jobs considered good when there are 300k+ births a month?

How can bays and straits be determined in a procedurally generated map?

How do you conduct xenoanthropology after first contact?

Shell script can be run only with sh command

Prevent a directory in /tmp from being deleted

Can a German sentence have two subjects?

Is there a minimum number of transactions in a block?

Is there really no realistic way for a skeleton monster to move around without magic?

What is the white spray-pattern residue inside these Falcon Heavy nozzles?

My colleague's body is amazing

How to calculate implied correlation via observed market price (Margrabe option)

Can I make popcorn with any corn?

Download, install and reboot computer at night if needed

How to type dʒ symbol (IPA) on Mac?

Could a US political party gain complete control over the government by removing checks & balances?

Example of a relative pronoun

What typically incentivizes a professor to change jobs to a lower ranking university?

How to make payment on the internet without leaving a money trail?

Accidentally leaked the solution to an assignment, what to do now? (I'm the prof)

A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?

What defenses are there against being summoned by the Gate spell?

What makes Graph invariants so useful/important?

Can Medicine checks be used, with decent rolls, to completely mitigate the risk of death from ongoing damage?

Copycat chess is back

Find all occurrences of string after another string in all files

How to find the total number of occurrences of text and files with find commandParsing string by awk and get only elements without pipes or semi colonsGetting the combined count of all occurrences of a string under multiple directories?How do i Reverse output already piped from sort and cut commandsHow to append string with dash?How do I search a directory of data files and print out how many times a certain word appears in each file?Standard deviation of number next to a patternDuplicate a file multiple times, write to duplicated files, sort the files, count position of specific lines after sortingCount the number of occurrences of a substring in a stringcounting the matched lines

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

My end goal is to have a script that will count the instances of each username in all files.

A username is a string, in quotes, that follows the string 'login'. For example, in one file, I might have:

"this":"is', "a":"strange", "type":"of":"object", "but":"please",
 "go":"withit", "login":"username1"

"this":"is', "login":"username2", "type":"of":"object", "but":"please",
 "go":"withit"

And in another file, I might have:

"this":"is', "a":"strange", "type":"of":"object", "but":"please",
 "go":"withit", "login":"username3"

"login":"username1", "please":"gowithit"

In which case, I'd like to have a txt file that contains a dict object with the count of the number of times each username appears in the files:

"username1": 2, "username2":1, "username3":1

I've read a few things to get me started, but I can't seem to put this together. I've sort of pseudocoded it, but I can't progress from this point.

I think I need to do this in two stages.

1) Get a list of all the usernames

2) Count the number of times each username appears in all files.

For task 1):

 grep 'login:' * | sed 's/^.*: //'
#Except I think this gets everything from the line after 'login', which isn't what I want.

For task 2):

for all_usernames_in_file:
 stringval = username_read_from_saved_file
 cat * | grep -c $stringval > output.txt

Can anyone take it from here?

EDIT:

Do you mean I should do this:

grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c | sed '1i s/s*([0-9]*)s*(.*)/"2": 1,/;$a' > output.txt

EDIT 2: Still not working. I'm trying to diagnose by understanding what each command does.

Let's say I'm just looking at this part to start:

grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c > myfile.txt

Right now, myfile.txt is blank.

Here's what I think this command is doing:

grep -o matches non-empty parts of a matching line.

'login":"[^"]*"' is the string we want grep to match. In the middle, the [^"] matches any character after login":" not equal to ", and the * says we want any length of match - that is, the length of the username doesn't matter, we want everything between the quotes.

| is a pipe. It means "and then"

cut -d '"' -f3 means slice up the returned line (all stuff after login":"), using the delimiter ", and take field 3 (that is, just the username).

| is a pipe. It means "and then"

sort the usernames

| is a pipe. It means "and then"

Get the unique usernames and count the number of times each appears.

If I take that much, and put a > myfile.txt at the end, then I should end up with a txt file that contains usernames and a count of the number of times each appears. It won't be well-formatted, but it will exist.

Why am I not getting such a file?

NOTE: does it matter that I'm searching through .json.gz formatted files? I've gotten the script to work when searching through txt, but not through the other format.

edited Mar 27 at 19:13

asked Mar 27 at 15:45

StatsSorceress

16517

Are the documents in fact JSON? Is the single quote in {"this":"is', a typo?

– Kusalananda♦
Mar 27 at 19:31

If this is supposed to be JSON documents, could you please add properly formatted examples of these documents? At the moment, the data is not JSON and any attempt to parse them using a proper JSON parser (jq) fails due to stray objects without keys.

– Kusalananda♦
Mar 27 at 20:07

add a comment |

My end goal is to have a script that will count the instances of each username in all files.

A username is a string, in quotes, that follows the string 'login'. For example, in one file, I might have:

"this":"is', "a":"strange", "type":"of":"object", "but":"please",
 "go":"withit", "login":"username1"

"this":"is', "login":"username2", "type":"of":"object", "but":"please",
 "go":"withit"

And in another file, I might have:

"this":"is', "a":"strange", "type":"of":"object", "but":"please",
 "go":"withit", "login":"username3"

"login":"username1", "please":"gowithit"

In which case, I'd like to have a txt file that contains a dict object with the count of the number of times each username appears in the files:

"username1": 2, "username2":1, "username3":1

I've read a few things to get me started, but I can't seem to put this together. I've sort of pseudocoded it, but I can't progress from this point.

I think I need to do this in two stages.

1) Get a list of all the usernames

2) Count the number of times each username appears in all files.

For task 1):

 grep 'login:' * | sed 's/^.*: //'
#Except I think this gets everything from the line after 'login', which isn't what I want.

For task 2):

for all_usernames_in_file:
 stringval = username_read_from_saved_file
 cat * | grep -c $stringval > output.txt

Can anyone take it from here?

EDIT:

Do you mean I should do this:

grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c | sed '1i s/s*([0-9]*)s*(.*)/"2": 1,/;$a' > output.txt

EDIT 2: Still not working. I'm trying to diagnose by understanding what each command does.

Let's say I'm just looking at this part to start:

grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c > myfile.txt

Right now, myfile.txt is blank.

Here's what I think this command is doing:

grep -o matches non-empty parts of a matching line.

| is a pipe. It means "and then"

cut -d '"' -f3 means slice up the returned line (all stuff after login":"), using the delimiter ", and take field 3 (that is, just the username).

| is a pipe. It means "and then"

sort the usernames

| is a pipe. It means "and then"

Get the unique usernames and count the number of times each appears.

Why am I not getting such a file?

NOTE: does it matter that I'm searching through .json.gz formatted files? I've gotten the script to work when searching through txt, but not through the other format.

edited Mar 27 at 19:13

asked Mar 27 at 15:45

StatsSorceress

16517

Are the documents in fact JSON? Is the single quote in {"this":"is', a typo?

– Kusalananda♦
Mar 27 at 19:31

If this is supposed to be JSON documents, could you please add properly formatted examples of these documents? At the moment, the data is not JSON and any attempt to parse them using a proper JSON parser (jq) fails due to stray objects without keys.

– Kusalananda♦
Mar 27 at 20:07

add a comment |

My end goal is to have a script that will count the instances of each username in all files.

A username is a string, in quotes, that follows the string 'login'. For example, in one file, I might have:

"this":"is', "a":"strange", "type":"of":"object", "but":"please",
 "go":"withit", "login":"username1"

"this":"is', "login":"username2", "type":"of":"object", "but":"please",
 "go":"withit"

And in another file, I might have:

"this":"is', "a":"strange", "type":"of":"object", "but":"please",
 "go":"withit", "login":"username3"

"login":"username1", "please":"gowithit"

In which case, I'd like to have a txt file that contains a dict object with the count of the number of times each username appears in the files:

"username1": 2, "username2":1, "username3":1

I've read a few things to get me started, but I can't seem to put this together. I've sort of pseudocoded it, but I can't progress from this point.

I think I need to do this in two stages.

1) Get a list of all the usernames

2) Count the number of times each username appears in all files.

For task 1):

 grep 'login:' * | sed 's/^.*: //'
#Except I think this gets everything from the line after 'login', which isn't what I want.

For task 2):

for all_usernames_in_file:
 stringval = username_read_from_saved_file
 cat * | grep -c $stringval > output.txt

Can anyone take it from here?

EDIT:

Do you mean I should do this:

grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c | sed '1i s/s*([0-9]*)s*(.*)/"2": 1,/;$a' > output.txt

EDIT 2: Still not working. I'm trying to diagnose by understanding what each command does.

Let's say I'm just looking at this part to start:

grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c > myfile.txt

Right now, myfile.txt is blank.

Here's what I think this command is doing:

grep -o matches non-empty parts of a matching line.

| is a pipe. It means "and then"

cut -d '"' -f3 means slice up the returned line (all stuff after login":"), using the delimiter ", and take field 3 (that is, just the username).

| is a pipe. It means "and then"

sort the usernames

| is a pipe. It means "and then"

Get the unique usernames and count the number of times each appears.

Why am I not getting such a file?

NOTE: does it matter that I'm searching through .json.gz formatted files? I've gotten the script to work when searching through txt, but not through the other format.

edited Mar 27 at 19:13

asked Mar 27 at 15:45

StatsSorceress

16517

My end goal is to have a script that will count the instances of each username in all files.

A username is a string, in quotes, that follows the string 'login'. For example, in one file, I might have:

"this":"is', "a":"strange", "type":"of":"object", "but":"please",
 "go":"withit", "login":"username1"

"this":"is', "login":"username2", "type":"of":"object", "but":"please",
 "go":"withit"

And in another file, I might have:

"this":"is', "a":"strange", "type":"of":"object", "but":"please",
 "go":"withit", "login":"username3"

"login":"username1", "please":"gowithit"

In which case, I'd like to have a txt file that contains a dict object with the count of the number of times each username appears in the files:

"username1": 2, "username2":1, "username3":1

I've read a few things to get me started, but I can't seem to put this together. I've sort of pseudocoded it, but I can't progress from this point.

I think I need to do this in two stages.

1) Get a list of all the usernames

2) Count the number of times each username appears in all files.

For task 1):

 grep 'login:' * | sed 's/^.*: //'
#Except I think this gets everything from the line after 'login', which isn't what I want.

For task 2):

for all_usernames_in_file:
 stringval = username_read_from_saved_file
 cat * | grep -c $stringval > output.txt

Can anyone take it from here?

EDIT:

Do you mean I should do this:

grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c | sed '1i s/s*([0-9]*)s*(.*)/"2": 1,/;$a' > output.txt

EDIT 2: Still not working. I'm trying to diagnose by understanding what each command does.

Let's say I'm just looking at this part to start:

grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c > myfile.txt

Right now, myfile.txt is blank.

Here's what I think this command is doing:

grep -o matches non-empty parts of a matching line.

| is a pipe. It means "and then"

cut -d '"' -f3 means slice up the returned line (all stuff after login":"), using the delimiter ", and take field 3 (that is, just the username).

| is a pipe. It means "and then"

sort the usernames

| is a pipe. It means "and then"

Get the unique usernames and count the number of times each appears.

Why am I not getting such a file?

NOTE: does it matter that I'm searching through .json.gz formatted files? I've gotten the script to work when searching through txt, but not through the other format.

grep

edited Mar 27 at 19:13

asked Mar 27 at 15:45

StatsSorceress

16517

edited Mar 27 at 19:13

asked Mar 27 at 15:45

StatsSorceress

16517

edited Mar 27 at 19:13

asked Mar 27 at 15:45

StatsSorceress

16517

asked Mar 27 at 15:45

StatsSorceress

16517

asked Mar 27 at 15:45

StatsSorceress

16517

Are the documents in fact JSON? Is the single quote in {"this":"is', a typo?

– Kusalananda♦
Mar 27 at 19:31

If this is supposed to be JSON documents, could you please add properly formatted examples of these documents? At the moment, the data is not JSON and any attempt to parse them using a proper JSON parser (jq) fails due to stray objects without keys.

– Kusalananda♦
Mar 27 at 20:07

add a comment |

Are the documents in fact JSON? Is the single quote in {"this":"is', a typo?

– Kusalananda♦
Mar 27 at 19:31

If this is supposed to be JSON documents, could you please add properly formatted examples of these documents? At the moment, the data is not JSON and any attempt to parse them using a proper JSON parser (jq) fails due to stray objects without keys.

– Kusalananda♦
Mar 27 at 20:07

Are the documents in fact JSON? Is the single quote in {"this":"is', a typo?

– Kusalananda♦
Mar 27 at 19:31

Are the documents in fact JSON? Is the single quote in
show 2 more comments

To get all usernames, i.e. all string associated with a login key, from a well formed JSON document, without knowing the document structure:

jq -r '.. print encode_json %h' file1 file2
"username3":1,"username2":1,"username1":2

Makes perfect sense since perldoc states that an END block is executed outside the implicit loop. I count myself educated. Thanks.

– bu5hman
Mar 29 at 15:15

print encode_json %h' file1 file2
"username3":1,"username2":1,"username1":2

How about using a perl hash keyed on a regex match, which you can convert using the JSON module:

$ perl -MJSON -lne '$h$1++ for /(?<="login":")(.*?)(?=")/g improve this answer

edited Mar 29 at 7:12

answered Mar 27 at 21:02

steeldriver

37.7k45389

How about using a perl hash keyed on a regex match, which you can convert using the JSON module:

$ perl -MJSON -lne '$h$1++ for /(?<="login":")(.*?)(?=")/g improve this answer

edited Mar 29 at 7:12

answered Mar 27 at 21:02

steeldriver

37.7k45389

Makes perfect sense since perldoc states that an END block is executed outside the implicit loop. I count myself educated. Thanks.

– bu5hman
Mar 29 at 15:15

is a cheat way of making everything to the right into an END block; everything to the left is looped over per the -n switch

– steeldriver
Mar 29 at 15:08

Makes perfect sense since perldoc states that an END block is executed outside the implicit loop. I count myself educated. Thanks.

– bu5hman
Mar 29 at 15:15

Took me a few minutes to get my head around this but its sweet. Why do you need the 'l' switch?

– bu5hman
Mar 29 at 14:45

@bu5hman TBH I'm hazy on the perl -l switch (autochomp?) - I tend to just throw it in if the newline handling screws up (forgive me, perl gods). There's a discussion of what it does here: The top 10 tricks of Perl one-liners

– steeldriver
Mar 29 at 14:52

Fair enough. Can you also educate a perl script kiddie like me as to why the print only executes once under the '-n' switch? Is this a perl specific construct?

– bu5hman
Mar 29 at 15:04

@bu5hman not sure where it's documented, but the is a cheat way of making everything to the right into an END block; everything to the left is looped over per the -n switch

– steeldriver
Mar 29 at 15:08

@bu5hman not sure where it's documented, but the { is a cheat way of making everything to the right into an END block; everything to the left is looped over per the -n switch

– steeldriver
Mar 29 at 15:08

Makes perfect sense since perldoc states that an END block is executed outside the implicit loop. I count myself educated. Thanks.

– bu5hman
Mar 29 at 15:15

|
show 2 more comments

@rush use of sed didn't work in my shell so I went this way

grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g;s/:/":/g;s/^([^])/"1/g'

The multiple sed can be amended if your shell lets you escape the " and print them in the awk statement.

grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print "$2", $1ENDprint""' | sed -E 's/([0-9])$/1,/g'

In my shell awk choked on the " in the second script. Not sure why but am sure that someone out there will tell me.

I also tried jq but it choked on the json files. There appears to be a syntax error

"this":"is' #is written so I edited these to
"this":"is"

Also jq didn't like the construct

"a":"strange" # so I also edited these to
b: "a":"strange"

If original files are supposed to be as per the edits made then jq works

jq '.login' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g'

answered Mar 27 at 20:01

bu5hman

1,333415

add a comment |

@rush use of sed didn't work in my shell so I went this way

grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g;s/:/":/g;s/^([^])/"1/g'

The multiple sed can be amended if your shell lets you escape the " and print them in the awk statement.

grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print "$2", $1ENDprint""' | sed -E 's/([0-9])$/1,/g'

In my shell awk choked on the " in the second script. Not sure why but am sure that someone out there will tell me.

I also tried jq but it choked on the json files. There appears to be a syntax error

"this":"is' #is written so I edited these to
"this":"is"

Also jq didn't like the construct

"a":"strange" # so I also edited these to
b: "a":"strange"

If original files are supposed to be as per the edits made then jq works

jq '.login' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g'

answered Mar 27 at 20:01

bu5hman

1,333415

add a comment |

@rush use of sed didn't work in my shell so I went this way

grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g;s/:/":/g;s/^([^])/"1/g'

The multiple sed can be amended if your shell lets you escape the " and print them in the awk statement.

grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print "$2", $1ENDprint""' | sed -E 's/([0-9])$/1,/g'

In my shell awk choked on the " in the second script. Not sure why but am sure that someone out there will tell me.

I also tried jq but it choked on the json files. There appears to be a syntax error

"this":"is' #is written so I edited these to
"this":"is"

Also jq didn't like the construct

"a":"strange" # so I also edited these to
b: "a":"strange"

If original files are supposed to be as per the edits made then jq works

jq '.login' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g'

answered Mar 27 at 20:01

bu5hman

1,333415

@rush use of sed didn't work in my shell so I went this way

grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g;s/:/":/g;s/^([^])/"1/g'

The multiple sed can be amended if your shell lets you escape the " and print them in the awk statement.

grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print "$2", $1ENDprint""' | sed -E 's/([0-9])$/1,/g'

In my shell awk choked on the " in the second script. Not sure why but am sure that someone out there will tell me.

I also tried jq but it choked on the json files. There appears to be a syntax error

"this":"is' #is written so I edited these to
"this":"is"

Also jq didn't like the construct

"a":"strange" # so I also edited these to
b: "a":"strange"

If original files are supposed to be as per the edits made then jq works

jq '.login' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g'

answered Mar 27 at 20:01

bu5hman

1,333415

answered Mar 27 at 20:01

bu5hman

1,333415

answered Mar 27 at 20:01

bu5hman

1,333415

answered Mar 27 at 20:01

bu5hman

1,333415

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f509020%2ffind-all-occurrences-of-string-after-another-string-in-all-files%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

-grep

搜尋此網誌

Ttyjfyk

Post as a guest

Popular posts from this blog

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog