BUILD LOGIC - what could be better than using AWK?Printing fields using awkHow to capture the port number from this `lsof -F` output using awk (or something better)?awk comparison using arraysHow to compare (using greater than and less than symbols) strings using awkRearranging columns using awkA better way than `tee | cut | … | paste`Pattern matching find equal or less than using regular expression in awkhow I could pass variable inside awk match?Text filtering using awkWrite shell script to analysis log file

Trouble understanding overseas colleagues

How will losing mobility of one hand affect my career as a programmer?

Is this Spell Mimic feat balanced?

What does this 7 mean above the f flat

Is a roofing delivery truck likely to crack my driveway slab?

Tiptoe or tiphoof? Adjusting words to better fit fantasy races

Is there any easy technique written in Bhagavad GITA to control lust?

What would be the benefits of having both a state and local currencies?

Dot above capital letter not centred

Implement the Thanos sorting algorithm

Bash method for viewing beginning and end of file

How do I rename a LINUX host without needing to reboot for the rename to take effect?

Go Pregnant or Go Home

What's a natural way to say that someone works somewhere (for a job)?

What's the purpose of "true" in bash "if sudo true; then"

Short story about space worker geeks who zone out by 'listening' to radiation from stars

What is the intuitive meaning of having a linear relationship between the logs of two variables?

Do I need a multiple entry visa for a trip UK -> Sweden -> UK?

I'm in charge of equipment buying but no one's ever happy with what I choose. How to fix this?

Is it okay / does it make sense for another player to join a running game of Munchkin?

Can a monster with multiattack use this ability if they are missing a limb?

Your magic is very sketchy

Valid Badminton Score?

Is expanding the research of a group into machine learning as a PhD student risky?

BUILD LOGIC - what could be better than using AWK?

Printing fields using awkHow to capture the port number from this `lsof -F` output using awk (or something better)?awk comparison using arraysHow to compare (using greater than and less than symbols) strings using awkRearranging columns using awkA better way than `tee | cut | … | paste`Pattern matching find equal or less than using regular expression in awkhow I could pass variable inside awk match?Text filtering using awkWrite shell script to analysis log file

Below is how a log looks like:

Company=XYZ
Req_id=1234
Time_taken=10 sec
Status=Success

Company=ABC
Req_id=3456
Time_taken=200 sec
Status=Failure

Company=DFG
Req_id=3001
Time_taken=15 sec
Status=Success

We have to get top 3 request IDs where max time is taken.

I tried with below solution where I get request id and time taken but I am not happy with my answer:

awk -vRS= -F'[=n]' '/Time_taken/print $4,$6' test.txt | sort -nr

How could I have done this better? Use some logic to code instead of inbuilt functions. Also, could someone help me to understand -F'[=n]' better? I have been just copying it from my previous scripts.

Output should be:

Below Request Id took more then expected
Request id 3456, Time Taken 200 sec
Request id 3001, Time Taken 15 sec 
Request id 1234, Time Taken 10 sec

edited yesterday

Anthony Geoghegan

7,94654055

asked Mar 23 at 11:10

Machine

335

I have edited my question with what i require. Please help to get this offhold.

– Machine
yesterday

1

You've got my reopen vote, hopefully others will follow (5 are required). Please, write "I" always with capital letter.

– peterh
yesterday

I cast the last required open vote. I’ve also edited to remove the syntax highlighting of the input and output. I’ve also upvoted it because it’s now a useful, clear question that shows what you’ve already tried or researched. For future questions, I’d recommend reading How to Ask and unix.meta.stackexchange.com/q/5015/22812

– Anthony Geoghegan
yesterday

add a comment |

Below is how a log looks like:

Company=XYZ
Req_id=1234
Time_taken=10 sec
Status=Success

Company=ABC
Req_id=3456
Time_taken=200 sec
Status=Failure

Company=DFG
Req_id=3001
Time_taken=15 sec
Status=Success

We have to get top 3 request IDs where max time is taken.

I tried with below solution where I get request id and time taken but I am not happy with my answer:

awk -vRS= -F'[=n]' '/Time_taken/print $4,$6' test.txt | sort -nr

Output should be:

Below Request Id took more then expected
Request id 3456, Time Taken 200 sec
Request id 3001, Time Taken 15 sec 
Request id 1234, Time Taken 10 sec

edited yesterday

Anthony Geoghegan

7,94654055

asked Mar 23 at 11:10

Machine

335

I have edited my question with what i require. Please help to get this offhold.

– Machine
yesterday

1

You've got my reopen vote, hopefully others will follow (5 are required). Please, write "I" always with capital letter.

– peterh
yesterday

I cast the last required open vote. I’ve also edited to remove the syntax highlighting of the input and output. I’ve also upvoted it because it’s now a useful, clear question that shows what you’ve already tried or researched. For future questions, I’d recommend reading How to Ask and unix.meta.stackexchange.com/q/5015/22812

– Anthony Geoghegan
yesterday

add a comment |

Below is how a log looks like:

Company=XYZ
Req_id=1234
Time_taken=10 sec
Status=Success

Company=ABC
Req_id=3456
Time_taken=200 sec
Status=Failure

Company=DFG
Req_id=3001
Time_taken=15 sec
Status=Success

We have to get top 3 request IDs where max time is taken.

I tried with below solution where I get request id and time taken but I am not happy with my answer:

awk -vRS= -F'[=n]' '/Time_taken/print $4,$6' test.txt | sort -nr

Output should be:

Below Request Id took more then expected
Request id 3456, Time Taken 200 sec
Request id 3001, Time Taken 15 sec 
Request id 1234, Time Taken 10 sec

edited yesterday

Anthony Geoghegan

7,94654055

asked Mar 23 at 11:10

Machine

335

Below is how a log looks like:

Company=XYZ
Req_id=1234
Time_taken=10 sec
Status=Success

Company=ABC
Req_id=3456
Time_taken=200 sec
Status=Failure

Company=DFG
Req_id=3001
Time_taken=15 sec
Status=Success

We have to get top 3 request IDs where max time is taken.

I tried with below solution where I get request id and time taken but I am not happy with my answer:

awk -vRS= -F'[=n]' '/Time_taken/print $4,$6' test.txt | sort -nr

Output should be:

Below Request Id took more then expected
Request id 3456, Time Taken 200 sec
Request id 3001, Time Taken 15 sec 
Request id 1234, Time Taken 10 sec

shell-script text-processing awk

edited yesterday

Anthony Geoghegan

7,94654055

asked Mar 23 at 11:10

Machine

335

edited yesterday

Anthony Geoghegan

7,94654055

asked Mar 23 at 11:10

Machine

335

edited yesterday

Anthony Geoghegan

7,94654055

edited yesterday

Anthony Geoghegan

7,94654055

edited yesterday

Anthony Geoghegan

7,94654055

asked Mar 23 at 11:10

Machine

335

asked Mar 23 at 11:10

Machine

335

asked Mar 23 at 11:10

Machine

335

I have edited my question with what i require. Please help to get this offhold.

– Machine
yesterday

1

You've got my reopen vote, hopefully others will follow (5 are required). Please, write "I" always with capital letter.

– peterh
yesterday

I cast the last required open vote. I’ve also edited to remove the syntax highlighting of the input and output. I’ve also upvoted it because it’s now a useful, clear question that shows what you’ve already tried or researched. For future questions, I’d recommend reading How to Ask and unix.meta.stackexchange.com/q/5015/22812

– Anthony Geoghegan
yesterday

add a comment |

I have edited my question with what i require. Please help to get this offhold.

– Machine
yesterday

1

You've got my reopen vote, hopefully others will follow (5 are required). Please, write "I" always with capital letter.

– peterh
yesterday

I cast the last required open vote. I’ve also edited to remove the syntax highlighting of the input and output. I’ve also upvoted it because it’s now a useful, clear question that shows what you’ve already tried or researched. For future questions, I’d recommend reading How to Ask and unix.meta.stackexchange.com/q/5015/22812

– Anthony Geoghegan
yesterday

I have edited my question with what i require. Please help to get this offhold.

– Machine
yesterday

You've got my reopen vote, hopefully others will follow (5 are required). Please, write "I" always with capital letter.

– peterh
yesterday

I cast the last required open vote. I’ve also edited to remove the syntax highlighting of the input and output. I’ve also upvoted it because it’s now a useful, clear question that shows what you’ve already tried or researched. For future questions, I’d recommend reading How to Ask and unix.meta.stackexchange.com/q/5015/22812

– Anthony Geoghegan
yesterday

add a comment |

4 Answers
4

active

oldest

votes

Using only awk:

function list_insert (value, id, tmp) 
 for (i = 1; i <= list_length; ++i)
 if (value > value_list[i]) 
 tmp = value_list[i]
 value_list[i] = value
 value = tmp

 tmp = id_list[i]
 id_list[i] = id
 id = tmp
 


BEGIN 
 FS = "[= ]"
 list_length = 3


$1 == "Req_id" id = $2 
$1 == "Time_taken" list_insert($2, id) 

END 
 printf("Below Request Id took more then expectedn")
 for (i = 1; i <= list_length; ++i)
 printf("Request id %d, time taken %d secn", id_list[i], value_list[i])

This program maintains two arrays, value_list and id_list, both of length list_length. The value_list array is sorted and contains the time values, while the id_list array contains the request IDs corresponding to the values in the first list.

The list_insert function inserts a new value and ID into the two arrays in such a way that the order of the value_list array is maintained (it finds the correct location for insertion and then shuffles the remaining items towards the end).

The rest of the program reads the data as newline-delimited records of fields delimited by = or spaces. When a request ID is found, this is saved in id, and when a "time taken" entry is found, that ID and the time take value are inserted into the arrays.

At the end, the two arrays are used to create the output.

Testing it:

$ awk -f script.awk file
Below Request Id took more then expected
Request id 3456, time taken 200 sec
Request id 3001, time taken 15 sec
Request id 1234, time taken 10 sec

answered yesterday

Kusalananda♦

138k17258426

You just blowed off my mind !!! Can you help with why you have set list_length = 3? and why its required to have insert_list function ? What solution @Archemar gave below is not better ?

– Machine
19 hours ago

@Machine I would not say my solution is better or worse than any other's. I'm assuming that there may be more than three records in the in-data, and I use list_length = 3 to get the top three results. The list_insert function adds data to the arrays that the program maintains, while always keeping the top three. This kind of combines the sort and head that Archemar is doing as post-processing steps.

– Kusalananda♦
19 hours ago

add a comment |

We can do this using the paragraph mode of Perl wherein we read in paragraph sized chunk of files as records and build up a hash keyed on IDs and values as time taken. At the end of file, we reverse sort numerically on the times taken and then print the desired message.

$ perl -ln -00 -e '
 %h = (%h, /^Req_id=(d+)n.*^Time_taken=(d+)/ms)}improve this answer

answered yesterday

Kusalananda♦

138k17258426

You just blowed off my mind !!! Can you help with why you have set list_length = 3? and why its required to have insert_list function ? What solution @Archemar gave below is not better ?

– Machine
19 hours ago

@Machine I would not say my solution is better or worse than any other's. I'm assuming that there may be more than three records in the in-data, and I use list_length = 3 to get the top three results. The list_insert function adds data to the arrays that the program maintains, while always keeping the top three. This kind of combines the sort and head that Archemar is doing as post-processing steps.

– Kusalananda♦
19 hours ago

add a comment improve this answer

edited 13 hours ago

answered yesterday

Rakesh Sharma

382115

add a comment

$ perl -ln -00 -e '
 %h = (%h, /^Req_id=(d+)n.*^Time_taken=(d+)/ms)improve this answer

edited 13 hours ago

answered yesterday

Rakesh Sharma

382115

$ perl -ln -00 -e '
 %h = (%h, /^Req_id=(d+)n.*^Time_taken=(d+)/ms){
 print "The below IDs took more than expected:" if scalar keys %h;
 print join " ", "Req ID:", "$_," , "Time taken", $h$_, "sec"
 for (sort $h$b <=> $h$a keys %h)[0..2];
' input.file

Output:

The below IDs took more than expected:
Req ID: 3456, Time taken 200 sec
Req ID: 3001, Time taken 15 sec
Req ID: 1234, Time taken 10 sec

edited 13 hours ago

answered yesterday

Rakesh Sharma

382115

edited 13 hours ago

answered yesterday

Rakesh Sharma

382115

answered yesterday

Rakesh Sharma

382115

answered yesterday

Rakesh Sharma

382115

add a comment |

you need to remember request number.

I would use (this could be one lined)

awk -F= '$1 == "Req_id" r=$2 ; 
 $1 == "Time_taken" printf "Request id %s %s %sn",r,$1,$2 ; ' file |
sort -r -n -k5 |
head -3

which give

Request id 3456 Time_taken 200 sec
Request id 3001 Time_taken 15 sec
Request id 1234 Time_taken 10 sec

where

-F= use = as separator

$1 == "Req_id" r=$2 ; get last request id

$1 == "Time_taken" if line is "time taken"

printf "Request id %s %s %sn",r,$1,$2 ; print request id and seconds

| sort pipe to sort

-r reverse order

-n numeric sort (e.g. 200 greater than 15)

-k5 on 5th field

| head -3 get first 3 lines

edited yesterday

answered yesterday

Archemar

20.4k93973

add a comment |

you need to remember request number.

I would use (this could be one lined)

awk -F= '$1 == "Req_id" r=$2 ; 
 $1 == "Time_taken" printf "Request id %s %s %sn",r,$1,$2 ; ' file |
sort -r -n -k5 |
head -3

which give

Request id 3456 Time_taken 200 sec
Request id 3001 Time_taken 15 sec
Request id 1234 Time_taken 10 sec

where

-F= use = as separator

$1 == "Req_id" r=$2 ; get last request id

$1 == "Time_taken" if line is "time taken"

printf "Request id %s %s %sn",r,$1,$2 ; print request id and seconds

| sort pipe to sort

-r reverse order

-n numeric sort (e.g. 200 greater than 15)

-k5 on 5th field

| head -3 get first 3 lines

edited yesterday

answered yesterday

Archemar

20.4k93973

add a comment |

you need to remember request number.

I would use (this could be one lined)

awk -F= '$1 == "Req_id" r=$2 ; 
 $1 == "Time_taken" printf "Request id %s %s %sn",r,$1,$2 ; ' file |
sort -r -n -k5 |
head -3

which give

Request id 3456 Time_taken 200 sec
Request id 3001 Time_taken 15 sec
Request id 1234 Time_taken 10 sec

where

-F= use = as separator

$1 == "Req_id" r=$2 ; get last request id

$1 == "Time_taken" if line is "time taken"

printf "Request id %s %s %sn",r,$1,$2 ; print request id and seconds

| sort pipe to sort

-r reverse order

-n numeric sort (e.g. 200 greater than 15)

-k5 on 5th field

| head -3 get first 3 lines

edited yesterday

answered yesterday

Archemar

20.4k93973

you need to remember request number.

I would use (this could be one lined)

awk -F= '$1 == "Req_id" r=$2 ; 
 $1 == "Time_taken" printf "Request id %s %s %sn",r,$1,$2 ; ' file |
sort -r -n -k5 |
head -3

which give

Request id 3456 Time_taken 200 sec
Request id 3001 Time_taken 15 sec
Request id 1234 Time_taken 10 sec

where

-F= use = as separator

$1 == "Req_id" r=$2 ; get last request id

$1 == "Time_taken" if line is "time taken"

printf "Request id %s %s %sn",r,$1,$2 ; print request id and seconds

| sort pipe to sort

-r reverse order

-n numeric sort (e.g. 200 greater than 15)

-k5 on 5th field

| head -3 get first 3 lines

edited yesterday

answered yesterday

Archemar

20.4k93973

edited yesterday

answered yesterday

Archemar

20.4k93973

answered yesterday

Archemar

20.4k93973

answered yesterday

Archemar

20.4k93973

add a comment |

I would prefer Python over awk for anything except one-liners. Using a Python script will allow you to more easily handle poorly formatted input and to perform logging and error-handling.

Here is a basic Python script that will produce the desired output for your example input, and which has a structure which is suggestive of how you might build it out further if needed:

#!/usr/bin/env python3
# -*- encoding: utf-8 -*-
"""parse_log.py"""

import sys
from collections import OrderedDict

logfilepath = sys.argv[1]

# Define a function to parse a single block/entry in the log file
def parse_block(block):
 parsed_block = dict()
 lines = block.split("n")
 for line in lines:
 if line.startswith("Company="):
 parsed_block["Company"] = line[8:]
 elif line.startswith("Req_id="):
 parsed_block["Required_ID"] = line[7:]
 elif line.startswith("Time_taken="):
 parsed_block["Time_Taken"] = line[11:]
 elif line.startswith("Status="):
 parsed_block["Status"] = line[7:]
 else:
 pass
 return parsed_block

# Initialize a list to store the processed entries
parsed_blocks = list()

# Populate the list
with open(logfilepath, "r") as logfile:
 blocks = logfile.read().split("nn")
 for block in blocks:
 parsed_block = parse_block(block)
 parsed_blocks.append(parsed_block)

# Print the results
print("Below Request Id took more then expected")
for parsed_block in parsed_blocks:
 print("Request id , Time Taken: ".format(parsed_block["Required_ID"], parsed_block["Time_Taken"]))

You could run it like this:

python parse_log.py data.log

On your example input, it produces the following output (as requested):

Below Request Id took more then expected
Request id 1234, Time Taken: 10 sec
Request id 3456, Time Taken: 200 sec
Request id 3001, Time Taken: 15 sec

answered yesterday

igal

6,0661537

add a comment |

I would prefer Python over awk for anything except one-liners. Using a Python script will allow you to more easily handle poorly formatted input and to perform logging and error-handling.

Here is a basic Python script that will produce the desired output for your example input, and which has a structure which is suggestive of how you might build it out further if needed:

#!/usr/bin/env python3
# -*- encoding: utf-8 -*-
"""parse_log.py"""

import sys
from collections import OrderedDict

logfilepath = sys.argv[1]

# Define a function to parse a single block/entry in the log file
def parse_block(block):
 parsed_block = dict()
 lines = block.split("n")
 for line in lines:
 if line.startswith("Company="):
 parsed_block["Company"] = line[8:]
 elif line.startswith("Req_id="):
 parsed_block["Required_ID"] = line[7:]
 elif line.startswith("Time_taken="):
 parsed_block["Time_Taken"] = line[11:]
 elif line.startswith("Status="):
 parsed_block["Status"] = line[7:]
 else:
 pass
 return parsed_block

# Initialize a list to store the processed entries
parsed_blocks = list()

# Populate the list
with open(logfilepath, "r") as logfile:
 blocks = logfile.read().split("nn")
 for block in blocks:
 parsed_block = parse_block(block)
 parsed_blocks.append(parsed_block)

# Print the results
print("Below Request Id took more then expected")
for parsed_block in parsed_blocks:
 print("Request id , Time Taken: ".format(parsed_block["Required_ID"], parsed_block["Time_Taken"]))

You could run it like this:

python parse_log.py data.log

On your example input, it produces the following output (as requested):

Below Request Id took more then expected
Request id 1234, Time Taken: 10 sec
Request id 3456, Time Taken: 200 sec
Request id 3001, Time Taken: 15 sec

answered yesterday

igal

6,0661537

add a comment |

I would prefer Python over awk for anything except one-liners. Using a Python script will allow you to more easily handle poorly formatted input and to perform logging and error-handling.

Here is a basic Python script that will produce the desired output for your example input, and which has a structure which is suggestive of how you might build it out further if needed:

#!/usr/bin/env python3
# -*- encoding: utf-8 -*-
"""parse_log.py"""

import sys
from collections import OrderedDict

logfilepath = sys.argv[1]

# Define a function to parse a single block/entry in the log file
def parse_block(block):
 parsed_block = dict()
 lines = block.split("n")
 for line in lines:
 if line.startswith("Company="):
 parsed_block["Company"] = line[8:]
 elif line.startswith("Req_id="):
 parsed_block["Required_ID"] = line[7:]
 elif line.startswith("Time_taken="):
 parsed_block["Time_Taken"] = line[11:]
 elif line.startswith("Status="):
 parsed_block["Status"] = line[7:]
 else:
 pass
 return parsed_block

# Initialize a list to store the processed entries
parsed_blocks = list()

# Populate the list
with open(logfilepath, "r") as logfile:
 blocks = logfile.read().split("nn")
 for block in blocks:
 parsed_block = parse_block(block)
 parsed_blocks.append(parsed_block)

# Print the results
print("Below Request Id took more then expected")
for parsed_block in parsed_blocks:
 print("Request id , Time Taken: ".format(parsed_block["Required_ID"], parsed_block["Time_Taken"]))

You could run it like this:

python parse_log.py data.log

On your example input, it produces the following output (as requested):

Below Request Id took more then expected
Request id 1234, Time Taken: 10 sec
Request id 3456, Time Taken: 200 sec
Request id 3001, Time Taken: 15 sec

answered yesterday

igal

6,0661537

I would prefer Python over awk for anything except one-liners. Using a Python script will allow you to more easily handle poorly formatted input and to perform logging and error-handling.

Here is a basic Python script that will produce the desired output for your example input, and which has a structure which is suggestive of how you might build it out further if needed:

#!/usr/bin/env python3
# -*- encoding: utf-8 -*-
"""parse_log.py"""

import sys
from collections import OrderedDict

logfilepath = sys.argv[1]

# Define a function to parse a single block/entry in the log file
def parse_block(block):
 parsed_block = dict()
 lines = block.split("n")
 for line in lines:
 if line.startswith("Company="):
 parsed_block["Company"] = line[8:]
 elif line.startswith("Req_id="):
 parsed_block["Required_ID"] = line[7:]
 elif line.startswith("Time_taken="):
 parsed_block["Time_Taken"] = line[11:]
 elif line.startswith("Status="):
 parsed_block["Status"] = line[7:]
 else:
 pass
 return parsed_block

# Initialize a list to store the processed entries
parsed_blocks = list()

# Populate the list
with open(logfilepath, "r") as logfile:
 blocks = logfile.read().split("nn")
 for block in blocks:
 parsed_block = parse_block(block)
 parsed_blocks.append(parsed_block)

# Print the results
print("Below Request Id took more then expected")
for parsed_block in parsed_blocks:
 print("Request id , Time Taken: ".format(parsed_block["Required_ID"], parsed_block["Time_Taken"]))

You could run it like this:

python parse_log.py data.log

On your example input, it produces the following output (as requested):

Below Request Id took more then expected
Request id 1234, Time Taken: 10 sec
Request id 3456, Time Taken: 200 sec
Request id 3001, Time Taken: 15 sec

answered yesterday

igal

6,0661537

answered yesterday

igal

6,0661537

answered yesterday

igal

6,0661537

answered yesterday

igal

6,0661537

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f508152%2fbuild-logic-what-could-be-better-than-using-awk%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

-awk, shell-script, text-processing

搜尋此網誌

Ttyjfyk

4 Answers
4

Post as a guest

Popular posts from this blog

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

4 Answers
4