Find all occurrences of string after another string in all filesHow to find the total number of occurrences of text and files with find commandParsing string by awk and get only elements without pipes or semi colonsGetting the combined count of all occurrences of a string under multiple directories?How do i Reverse output already piped from sort and cut commandsHow to append string with dash?How do I search a directory of data files and print out how many times a certain word appears in each file?Standard deviation of number next to a patternDuplicate a file multiple times, write to duplicated files, sort the files, count position of specific lines after sortingCount the number of occurrences of a substring in a stringcounting the matched lines
Why are 150k or 200k jobs considered good when there are 300k+ births a month?
How can bays and straits be determined in a procedurally generated map?
How do you conduct xenoanthropology after first contact?
Shell script can be run only with sh command
Prevent a directory in /tmp from being deleted
Can a German sentence have two subjects?
Is there a minimum number of transactions in a block?
Is there really no realistic way for a skeleton monster to move around without magic?
What is the white spray-pattern residue inside these Falcon Heavy nozzles?
My colleague's body is amazing
How to calculate implied correlation via observed market price (Margrabe option)
Can I make popcorn with any corn?
Download, install and reboot computer at night if needed
How to type dʒ symbol (IPA) on Mac?
Could a US political party gain complete control over the government by removing checks & balances?
Example of a relative pronoun
What typically incentivizes a professor to change jobs to a lower ranking university?
How to make payment on the internet without leaving a money trail?
Accidentally leaked the solution to an assignment, what to do now? (I'm the prof)
A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?
What defenses are there against being summoned by the Gate spell?
What makes Graph invariants so useful/important?
Can Medicine checks be used, with decent rolls, to completely mitigate the risk of death from ongoing damage?
Copycat chess is back
Find all occurrences of string after another string in all files
How to find the total number of occurrences of text and files with find commandParsing string by awk and get only elements without pipes or semi colonsGetting the combined count of all occurrences of a string under multiple directories?How do i Reverse output already piped from sort and cut commandsHow to append string with dash?How do I search a directory of data files and print out how many times a certain word appears in each file?Standard deviation of number next to a patternDuplicate a file multiple times, write to duplicated files, sort the files, count position of specific lines after sortingCount the number of occurrences of a substring in a stringcounting the matched lines
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
My end goal is to have a script that will count the instances of each username in all files.
A username is a string, in quotes, that follows the string 'login'. For example, in one file, I might have:
"this":"is', "a":"strange", "type":"of":"object", "but":"please",
"go":"withit", "login":"username1"
"this":"is', "login":"username2", "type":"of":"object", "but":"please",
"go":"withit"
And in another file, I might have:
"this":"is', "a":"strange", "type":"of":"object", "but":"please",
"go":"withit", "login":"username3"
"login":"username1", "please":"gowithit"
In which case, I'd like to have a txt file that contains a dict object with the count of the number of times each username appears in the files:
"username1": 2, "username2":1, "username3":1
I've read a few things to get me started, but I can't seem to put this together. I've sort of pseudocoded it, but I can't progress from this point.
I think I need to do this in two stages.
1) Get a list of all the usernames
2) Count the number of times each username appears in all files.
For task 1):
grep 'login:' * | sed 's/^.*: //'
#Except I think this gets everything from the line after 'login', which isn't what I want.
For task 2):
for all_usernames_in_file:
stringval = username_read_from_saved_file
cat * | grep -c $stringval > output.txt
Can anyone take it from here?
EDIT:
Do you mean I should do this:
grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c | sed '1i s/s*([0-9]*)s*(.*)/"2": 1,/;$a' > output.txt
EDIT 2: Still not working. I'm trying to diagnose by understanding what each command does.
Let's say I'm just looking at this part to start:
grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c > myfile.txt
Right now, myfile.txt
is blank.
Here's what I think this command is doing:
grep -o
matches non-empty parts of a matching line.
'login":"[^"]*"'
is the string we want grep to match. In the middle, the [^"]
matches any character after login":"
not equal to "
, and the *
says we want any length of match - that is, the length of the username doesn't matter, we want everything between the quotes.
|
is a pipe. It means "and then"
cut -d '"' -f3
means slice up the returned line (all stuff after login":"
), using the delimiter "
, and take field 3 (that is, just the username).
|
is a pipe. It means "and then"
sort
the usernames
|
is a pipe. It means "and then"
Get the unique usernames and count the number of times each appears.
If I take that much, and put a > myfile.txt
at the end, then I should end up with a txt file that contains usernames and a count of the number of times each appears. It won't be well-formatted, but it will exist.
Why am I not getting such a file?
NOTE: does it matter that I'm searching through .json.gz
formatted files? I've gotten the script to work when searching through txt
, but not through the other format.
grep
add a comment |
My end goal is to have a script that will count the instances of each username in all files.
A username is a string, in quotes, that follows the string 'login'. For example, in one file, I might have:
"this":"is', "a":"strange", "type":"of":"object", "but":"please",
"go":"withit", "login":"username1"
"this":"is', "login":"username2", "type":"of":"object", "but":"please",
"go":"withit"
And in another file, I might have:
"this":"is', "a":"strange", "type":"of":"object", "but":"please",
"go":"withit", "login":"username3"
"login":"username1", "please":"gowithit"
In which case, I'd like to have a txt file that contains a dict object with the count of the number of times each username appears in the files:
"username1": 2, "username2":1, "username3":1
I've read a few things to get me started, but I can't seem to put this together. I've sort of pseudocoded it, but I can't progress from this point.
I think I need to do this in two stages.
1) Get a list of all the usernames
2) Count the number of times each username appears in all files.
For task 1):
grep 'login:' * | sed 's/^.*: //'
#Except I think this gets everything from the line after 'login', which isn't what I want.
For task 2):
for all_usernames_in_file:
stringval = username_read_from_saved_file
cat * | grep -c $stringval > output.txt
Can anyone take it from here?
EDIT:
Do you mean I should do this:
grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c | sed '1i s/s*([0-9]*)s*(.*)/"2": 1,/;$a' > output.txt
EDIT 2: Still not working. I'm trying to diagnose by understanding what each command does.
Let's say I'm just looking at this part to start:
grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c > myfile.txt
Right now, myfile.txt
is blank.
Here's what I think this command is doing:
grep -o
matches non-empty parts of a matching line.
'login":"[^"]*"'
is the string we want grep to match. In the middle, the [^"]
matches any character after login":"
not equal to "
, and the *
says we want any length of match - that is, the length of the username doesn't matter, we want everything between the quotes.
|
is a pipe. It means "and then"
cut -d '"' -f3
means slice up the returned line (all stuff after login":"
), using the delimiter "
, and take field 3 (that is, just the username).
|
is a pipe. It means "and then"
sort
the usernames
|
is a pipe. It means "and then"
Get the unique usernames and count the number of times each appears.
If I take that much, and put a > myfile.txt
at the end, then I should end up with a txt file that contains usernames and a count of the number of times each appears. It won't be well-formatted, but it will exist.
Why am I not getting such a file?
NOTE: does it matter that I'm searching through .json.gz
formatted files? I've gotten the script to work when searching through txt
, but not through the other format.
grep
Are the documents in fact JSON? Is the single quote in{"this":"is',
a typo?
– Kusalananda♦
Mar 27 at 19:31
If this is supposed to be JSON documents, could you please add properly formatted examples of these documents? At the moment, the data is not JSON and any attempt to parse them using a proper JSON parser (jq
) fails due to stray objects without keys.
– Kusalananda♦
Mar 27 at 20:07
add a comment |
My end goal is to have a script that will count the instances of each username in all files.
A username is a string, in quotes, that follows the string 'login'. For example, in one file, I might have:
"this":"is', "a":"strange", "type":"of":"object", "but":"please",
"go":"withit", "login":"username1"
"this":"is', "login":"username2", "type":"of":"object", "but":"please",
"go":"withit"
And in another file, I might have:
"this":"is', "a":"strange", "type":"of":"object", "but":"please",
"go":"withit", "login":"username3"
"login":"username1", "please":"gowithit"
In which case, I'd like to have a txt file that contains a dict object with the count of the number of times each username appears in the files:
"username1": 2, "username2":1, "username3":1
I've read a few things to get me started, but I can't seem to put this together. I've sort of pseudocoded it, but I can't progress from this point.
I think I need to do this in two stages.
1) Get a list of all the usernames
2) Count the number of times each username appears in all files.
For task 1):
grep 'login:' * | sed 's/^.*: //'
#Except I think this gets everything from the line after 'login', which isn't what I want.
For task 2):
for all_usernames_in_file:
stringval = username_read_from_saved_file
cat * | grep -c $stringval > output.txt
Can anyone take it from here?
EDIT:
Do you mean I should do this:
grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c | sed '1i s/s*([0-9]*)s*(.*)/"2": 1,/;$a' > output.txt
EDIT 2: Still not working. I'm trying to diagnose by understanding what each command does.
Let's say I'm just looking at this part to start:
grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c > myfile.txt
Right now, myfile.txt
is blank.
Here's what I think this command is doing:
grep -o
matches non-empty parts of a matching line.
'login":"[^"]*"'
is the string we want grep to match. In the middle, the [^"]
matches any character after login":"
not equal to "
, and the *
says we want any length of match - that is, the length of the username doesn't matter, we want everything between the quotes.
|
is a pipe. It means "and then"
cut -d '"' -f3
means slice up the returned line (all stuff after login":"
), using the delimiter "
, and take field 3 (that is, just the username).
|
is a pipe. It means "and then"
sort
the usernames
|
is a pipe. It means "and then"
Get the unique usernames and count the number of times each appears.
If I take that much, and put a > myfile.txt
at the end, then I should end up with a txt file that contains usernames and a count of the number of times each appears. It won't be well-formatted, but it will exist.
Why am I not getting such a file?
NOTE: does it matter that I'm searching through .json.gz
formatted files? I've gotten the script to work when searching through txt
, but not through the other format.
grep
My end goal is to have a script that will count the instances of each username in all files.
A username is a string, in quotes, that follows the string 'login'. For example, in one file, I might have:
"this":"is', "a":"strange", "type":"of":"object", "but":"please",
"go":"withit", "login":"username1"
"this":"is', "login":"username2", "type":"of":"object", "but":"please",
"go":"withit"
And in another file, I might have:
"this":"is', "a":"strange", "type":"of":"object", "but":"please",
"go":"withit", "login":"username3"
"login":"username1", "please":"gowithit"
In which case, I'd like to have a txt file that contains a dict object with the count of the number of times each username appears in the files:
"username1": 2, "username2":1, "username3":1
I've read a few things to get me started, but I can't seem to put this together. I've sort of pseudocoded it, but I can't progress from this point.
I think I need to do this in two stages.
1) Get a list of all the usernames
2) Count the number of times each username appears in all files.
For task 1):
grep 'login:' * | sed 's/^.*: //'
#Except I think this gets everything from the line after 'login', which isn't what I want.
For task 2):
for all_usernames_in_file:
stringval = username_read_from_saved_file
cat * | grep -c $stringval > output.txt
Can anyone take it from here?
EDIT:
Do you mean I should do this:
grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c | sed '1i s/s*([0-9]*)s*(.*)/"2": 1,/;$a' > output.txt
EDIT 2: Still not working. I'm trying to diagnose by understanding what each command does.
Let's say I'm just looking at this part to start:
grep -o 'login":"[^"]*"' /path/to/dir/* | cut -d'"' -f3 | sort | uniq -c > myfile.txt
Right now, myfile.txt
is blank.
Here's what I think this command is doing:
grep -o
matches non-empty parts of a matching line.
'login":"[^"]*"'
is the string we want grep to match. In the middle, the [^"]
matches any character after login":"
not equal to "
, and the *
says we want any length of match - that is, the length of the username doesn't matter, we want everything between the quotes.
|
is a pipe. It means "and then"
cut -d '"' -f3
means slice up the returned line (all stuff after login":"
), using the delimiter "
, and take field 3 (that is, just the username).
|
is a pipe. It means "and then"
sort
the usernames
|
is a pipe. It means "and then"
Get the unique usernames and count the number of times each appears.
If I take that much, and put a > myfile.txt
at the end, then I should end up with a txt file that contains usernames and a count of the number of times each appears. It won't be well-formatted, but it will exist.
Why am I not getting such a file?
NOTE: does it matter that I'm searching through .json.gz
formatted files? I've gotten the script to work when searching through txt
, but not through the other format.
grep
grep
edited Mar 27 at 19:13
StatsSorceress
asked Mar 27 at 15:45
StatsSorceressStatsSorceress
16517
16517
Are the documents in fact JSON? Is the single quote in{"this":"is',
a typo?
– Kusalananda♦
Mar 27 at 19:31
If this is supposed to be JSON documents, could you please add properly formatted examples of these documents? At the moment, the data is not JSON and any attempt to parse them using a proper JSON parser (jq
) fails due to stray objects without keys.
– Kusalananda♦
Mar 27 at 20:07
add a comment |
Are the documents in fact JSON? Is the single quote in{"this":"is',
a typo?
– Kusalananda♦
Mar 27 at 19:31
If this is supposed to be JSON documents, could you please add properly formatted examples of these documents? At the moment, the data is not JSON and any attempt to parse them using a proper JSON parser (jq
) fails due to stray objects without keys.
– Kusalananda♦
Mar 27 at 20:07
Are the documents in fact JSON? Is the single quote in
{"this":"is',
a typo?– Kusalananda♦
Mar 27 at 19:31
Are the documents in fact JSON? Is the single quote in
show 2 more comments
To get all usernames, i.e. all string associated with a login
key, from a well formed JSON document, without knowing the document structure:
jq -r '.. print encode_json %h' file1 file2
"username3":1,"username2":1,"username1":2
Makes perfect sense since
perldoc
states that an END
block is executed outside the implicit loop. I count myself educated. Thanks.– bu5hman
Mar 29 at 15:15
print encode_json %h' file1 file2
"username3":1,"username2":1,"username1":2
How about using a perl hash keyed on a regex match, which you can convert using the JSON module:
$ perl -MJSON -lne '$h$1++ for /(?<="login":")(.*?)(?=")/g improve this answer
How about using a perl hash keyed on a regex match, which you can convert using the JSON module:
$ perl -MJSON -lne '$h$1++ for /(?<="login":")(.*?)(?=")/g improve this answer
Makes perfect sense since
perldoc
states that an END
block is executed outside the implicit loop. I count myself educated. Thanks.– bu5hman
Mar 29 at 15:15
is a cheat way of making everything to the right into an
END
block; everything to the left is looped over per the -n
switch– steeldriver
Mar 29 at 15:08
Makes perfect sense since
perldoc
states that an END
block is executed outside the implicit loop. I count myself educated. Thanks.– bu5hman
Mar 29 at 15:15
Took me a few minutes to get my head around this but its sweet. Why do you need the 'l' switch?
– bu5hman
Mar 29 at 14:45
Took me a few minutes to get my head around this but its sweet. Why do you need the 'l' switch?
– bu5hman
Mar 29 at 14:45
@bu5hman TBH I'm hazy on the perl
-l
switch (autochomp?) - I tend to just throw it in if the newline handling screws up (forgive me, perl gods). There's a discussion of what it does here: The top 10 tricks of Perl one-liners– steeldriver
Mar 29 at 14:52
@bu5hman TBH I'm hazy on the perl
-l
switch (autochomp?) - I tend to just throw it in if the newline handling screws up (forgive me, perl gods). There's a discussion of what it does here: The top 10 tricks of Perl one-liners– steeldriver
Mar 29 at 14:52
Fair enough. Can you also educate a perl script kiddie like me as to why the print only executes once under the '-n' switch? Is this a perl specific construct?
– bu5hman
Mar 29 at 15:04
Fair enough. Can you also educate a perl script kiddie like me as to why the print only executes once under the '-n' switch? Is this a perl specific construct?
– bu5hman
Mar 29 at 15:04
@bu5hman not sure where it's documented, but the
is a cheat way of making everything to the right into an END
block; everything to the left is looped over per the -n
switch– steeldriver
Mar 29 at 15:08
@bu5hman not sure where it's documented, but the
{
is a cheat way of making everything to the right into an END
block; everything to the left is looped over per the -n
switch– steeldriver
Mar 29 at 15:08
Makes perfect sense since
perldoc
states that an END
block is executed outside the implicit loop. I count myself educated. Thanks.– bu5hman
Mar 29 at 15:15
Makes perfect sense since
perldoc
states that an END
block is executed outside the implicit loop. I count myself educated. Thanks.– bu5hman
Mar 29 at 15:15
|
show 2 more comments
@rush use of sed
didn't work in my shell so I went this way
grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g;s/:/":/g;s/^([^])/"1/g'
The multiple sed
can be amended if your shell lets you escape the "
and print them in the awk
statement.
grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print "$2", $1ENDprint""' | sed -E 's/([0-9])$/1,/g'
In my shell awk
choked on the "
in the second script. Not sure why but am sure that someone out there will tell me.
I also tried jq
but it choked on the json files. There appears to be a syntax error
"this":"is' #is written so I edited these to
"this":"is"
Also jq
didn't like the construct
"a":"strange" # so I also edited these to
b: "a":"strange"
If original files are supposed to be as per the edits made then jq
works
jq '.login' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g'
add a comment |
@rush use of sed
didn't work in my shell so I went this way
grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g;s/:/":/g;s/^([^])/"1/g'
The multiple sed
can be amended if your shell lets you escape the "
and print them in the awk
statement.
grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print "$2", $1ENDprint""' | sed -E 's/([0-9])$/1,/g'
In my shell awk
choked on the "
in the second script. Not sure why but am sure that someone out there will tell me.
I also tried jq
but it choked on the json files. There appears to be a syntax error
"this":"is' #is written so I edited these to
"this":"is"
Also jq
didn't like the construct
"a":"strange" # so I also edited these to
b: "a":"strange"
If original files are supposed to be as per the edits made then jq
works
jq '.login' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g'
add a comment |
@rush use of sed
didn't work in my shell so I went this way
grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g;s/:/":/g;s/^([^])/"1/g'
The multiple sed
can be amended if your shell lets you escape the "
and print them in the awk
statement.
grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print "$2", $1ENDprint""' | sed -E 's/([0-9])$/1,/g'
In my shell awk
choked on the "
in the second script. Not sure why but am sure that someone out there will tell me.
I also tried jq
but it choked on the json files. There appears to be a syntax error
"this":"is' #is written so I edited these to
"this":"is"
Also jq
didn't like the construct
"a":"strange" # so I also edited these to
b: "a":"strange"
If original files are supposed to be as per the edits made then jq
works
jq '.login' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g'
@rush use of sed
didn't work in my shell so I went this way
grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g;s/:/":/g;s/^([^])/"1/g'
The multiple sed
can be amended if your shell lets you escape the "
and print them in the awk
statement.
grep -Poh '(?<=login":")[^"]*' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print "$2", $1ENDprint""' | sed -E 's/([0-9])$/1,/g'
In my shell awk
choked on the "
in the second script. Not sure why but am sure that someone out there will tell me.
I also tried jq
but it choked on the json files. There appears to be a syntax error
"this":"is' #is written so I edited these to
"this":"is"
Also jq
didn't like the construct
"a":"strange" # so I also edited these to
b: "a":"strange"
If original files are supposed to be as per the edits made then jq
works
jq '.login' json* | sort | uniq -c | awk -v OFS=': ' 'BEGINprint ""print $2, $1ENDprint""' | sed -E 's/([0-9])$/1,/g'
answered Mar 27 at 20:01
bu5hmanbu5hman
1,333415
1,333415
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f509020%2ffind-all-occurrences-of-string-after-another-string-in-all-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
-grep
Are the documents in fact JSON? Is the single quote in
{"this":"is',
a typo?– Kusalananda♦
Mar 27 at 19:31
If this is supposed to be JSON documents, could you please add properly formatted examples of these documents? At the moment, the data is not JSON and any attempt to parse them using a proper JSON parser (
jq
) fails due to stray objects without keys.– Kusalananda♦
Mar 27 at 20:07