Extracting lines to new files The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election ResultsExtracting file name and string from multiple filesExtracting part of lines with specific pattern using awk,sedhow to print new word after two lines using awkExtracting lines based on conditionsextracting date field from the linesinsert new lines into a csv file obtained via curl on an apiarithmetic operations within column with awk or sedExtracting pattern from multiple linesprint out lines if first three columns match the first three columns in another fileSplitting text file into CSV with multiple delimiters in bash?

Do ℕ, mathbbN, BbbN, symbbN effectively differ, and is there a "canonical" specification of the naturals?

Why doesn't shell automatically fix "useless use of cat"?

How many cones with angle theta can I pack into the unit sphere?

Is there a way to generate uniformly distributed points on a sphere from a fixed amount of random real numbers per point?

What information about me do stores get via my credit card?

Accepted by European university, rejected by all American ones I applied to? Possible reasons?

Button changing its text & action. Good or terrible?

Is 'stolen' appropriate word?

Python - Fishing Simulator

Are spiders unable to hurt humans, especially very small spiders?

Word to describe a time interval

Deal with toxic manager when you can't quit

Does Parliament need to approve the new Brexit delay to 31 October 2019?

How did the audience guess the pentatonic scale in Bobby McFerrin's presentation?

Identify 80s or 90s comics with ripped creatures (not dwarves)

Is there a writing software that you can sort scenes like slides in PowerPoint?

Circular reasoning in L'Hopital's rule

What was the last x86 CPU that did not have the x87 floating-point unit built in?

"is" operation returns false even though two objects have same id

The following signatures were invalid: EXPKEYSIG 1397BC53640DB551

should truth entail possible truth

How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time

Using dividends to reduce short term capital gains?

Match Roman Numerals



Extracting lines to new files



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election ResultsExtracting file name and string from multiple filesExtracting part of lines with specific pattern using awk,sedhow to print new word after two lines using awkExtracting lines based on conditionsextracting date field from the linesinsert new lines into a csv file obtained via curl on an apiarithmetic operations within column with awk or sedExtracting pattern from multiple linesprint out lines if first three columns match the first three columns in another fileSplitting text file into CSV with multiple delimiters in bash?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















Say I have a large CSV file with a header and several columns. For the purpose of this question I will consider a small file with just two columns. We can call it use_rep.



user_id,rep
885,500K+
22565,200K+
7453,200K+
86440,100K+
116858,100K+
22222,100K+
38906,100K+
10762,<100K
70524,<100K


I'd like to send each row to a file corresponding to the value on the second column. For example, I'd like there to be a file whose name is 200K+ and whose content is



user_id,rep
22565,200K+
7453,200K+


The contents of use_rep should not be assumed to be ordered in anyway. The pattern to be used would ideally accept regular expressions.



No sed or perl is preferred.










share|improve this question









New contributor




regex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • I think AWK can do this easily, but I don't really know how.

    – regex
    yesterday

















0















Say I have a large CSV file with a header and several columns. For the purpose of this question I will consider a small file with just two columns. We can call it use_rep.



user_id,rep
885,500K+
22565,200K+
7453,200K+
86440,100K+
116858,100K+
22222,100K+
38906,100K+
10762,<100K
70524,<100K


I'd like to send each row to a file corresponding to the value on the second column. For example, I'd like there to be a file whose name is 200K+ and whose content is



user_id,rep
22565,200K+
7453,200K+


The contents of use_rep should not be assumed to be ordered in anyway. The pattern to be used would ideally accept regular expressions.



No sed or perl is preferred.










share|improve this question









New contributor




regex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • I think AWK can do this easily, but I don't really know how.

    – regex
    yesterday













0












0








0








Say I have a large CSV file with a header and several columns. For the purpose of this question I will consider a small file with just two columns. We can call it use_rep.



user_id,rep
885,500K+
22565,200K+
7453,200K+
86440,100K+
116858,100K+
22222,100K+
38906,100K+
10762,<100K
70524,<100K


I'd like to send each row to a file corresponding to the value on the second column. For example, I'd like there to be a file whose name is 200K+ and whose content is



user_id,rep
22565,200K+
7453,200K+


The contents of use_rep should not be assumed to be ordered in anyway. The pattern to be used would ideally accept regular expressions.



No sed or perl is preferred.










share|improve this question









New contributor




regex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












Say I have a large CSV file with a header and several columns. For the purpose of this question I will consider a small file with just two columns. We can call it use_rep.



user_id,rep
885,500K+
22565,200K+
7453,200K+
86440,100K+
116858,100K+
22222,100K+
38906,100K+
10762,<100K
70524,<100K


I'd like to send each row to a file corresponding to the value on the second column. For example, I'd like there to be a file whose name is 200K+ and whose content is



user_id,rep
22565,200K+
7453,200K+


The contents of use_rep should not be assumed to be ordered in anyway. The pattern to be used would ideally accept regular expressions.



No sed or perl is preferred.







text-processing awk pattern-matching






share|improve this question









New contributor




regex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




regex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited yesterday







regex













New contributor




regex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked yesterday









regexregex

223




223




New contributor




regex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





regex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






regex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • I think AWK can do this easily, but I don't really know how.

    – regex
    yesterday

















  • I think AWK can do this easily, but I don't really know how.

    – regex
    yesterday
















I think AWK can do this easily, but I don't really know how.

– regex
yesterday





I think AWK can do this easily, but I don't really know how.

– regex
yesterday










1 Answer
1






active

oldest

votes


















3














Ignoring the header (which you can tack on later):



awk -F, 'NR > 1 print > $2' use_rep


which will print each line to a file named by the second column:



~ head *[0-9]*
==> 100K+ <==
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
22565,200K+
7453,200K+

==> 500K+ <==
885,500K+

==> <100K <==
10762,<100K


To put the header, maybe something like:



awk -F, 'NR == 1 header = $0; next # save header, skip this line
!a[$2]++ print header > $2 # print if second field hasnt been seen before
print > $2 ' use_rep


Result:



~ head *[0-9]*
==> 100K+ <==
user_id,rep
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
user_id,rep
22565,200K+
7453,200K+

==> 500K+ <==
user_id,rep
885,500K+

==> <100K <==
user_id,rep
10762,<100K
70524,<100K





share|improve this answer

























  • I'm having some issues with this due to commas inside text identifiers (" "). Is there an easy fix?

    – regex
    yesterday











  • Not with awk. You should use a tool with support for quoted csv, like csvkit or Python or Perl.

    – muru
    yesterday











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






regex is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f511886%2fextracting-lines-to-new-files%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














Ignoring the header (which you can tack on later):



awk -F, 'NR > 1 print > $2' use_rep


which will print each line to a file named by the second column:



~ head *[0-9]*
==> 100K+ <==
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
22565,200K+
7453,200K+

==> 500K+ <==
885,500K+

==> <100K <==
10762,<100K


To put the header, maybe something like:



awk -F, 'NR == 1 header = $0; next # save header, skip this line
!a[$2]++ print header > $2 # print if second field hasnt been seen before
print > $2 ' use_rep


Result:



~ head *[0-9]*
==> 100K+ <==
user_id,rep
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
user_id,rep
22565,200K+
7453,200K+

==> 500K+ <==
user_id,rep
885,500K+

==> <100K <==
user_id,rep
10762,<100K
70524,<100K





share|improve this answer

























  • I'm having some issues with this due to commas inside text identifiers (" "). Is there an easy fix?

    – regex
    yesterday











  • Not with awk. You should use a tool with support for quoted csv, like csvkit or Python or Perl.

    – muru
    yesterday















3














Ignoring the header (which you can tack on later):



awk -F, 'NR > 1 print > $2' use_rep


which will print each line to a file named by the second column:



~ head *[0-9]*
==> 100K+ <==
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
22565,200K+
7453,200K+

==> 500K+ <==
885,500K+

==> <100K <==
10762,<100K


To put the header, maybe something like:



awk -F, 'NR == 1 header = $0; next # save header, skip this line
!a[$2]++ print header > $2 # print if second field hasnt been seen before
print > $2 ' use_rep


Result:



~ head *[0-9]*
==> 100K+ <==
user_id,rep
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
user_id,rep
22565,200K+
7453,200K+

==> 500K+ <==
user_id,rep
885,500K+

==> <100K <==
user_id,rep
10762,<100K
70524,<100K





share|improve this answer

























  • I'm having some issues with this due to commas inside text identifiers (" "). Is there an easy fix?

    – regex
    yesterday











  • Not with awk. You should use a tool with support for quoted csv, like csvkit or Python or Perl.

    – muru
    yesterday













3












3








3







Ignoring the header (which you can tack on later):



awk -F, 'NR > 1 print > $2' use_rep


which will print each line to a file named by the second column:



~ head *[0-9]*
==> 100K+ <==
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
22565,200K+
7453,200K+

==> 500K+ <==
885,500K+

==> <100K <==
10762,<100K


To put the header, maybe something like:



awk -F, 'NR == 1 header = $0; next # save header, skip this line
!a[$2]++ print header > $2 # print if second field hasnt been seen before
print > $2 ' use_rep


Result:



~ head *[0-9]*
==> 100K+ <==
user_id,rep
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
user_id,rep
22565,200K+
7453,200K+

==> 500K+ <==
user_id,rep
885,500K+

==> <100K <==
user_id,rep
10762,<100K
70524,<100K





share|improve this answer















Ignoring the header (which you can tack on later):



awk -F, 'NR > 1 print > $2' use_rep


which will print each line to a file named by the second column:



~ head *[0-9]*
==> 100K+ <==
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
22565,200K+
7453,200K+

==> 500K+ <==
885,500K+

==> <100K <==
10762,<100K


To put the header, maybe something like:



awk -F, 'NR == 1 header = $0; next # save header, skip this line
!a[$2]++ print header > $2 # print if second field hasnt been seen before
print > $2 ' use_rep


Result:



~ head *[0-9]*
==> 100K+ <==
user_id,rep
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
user_id,rep
22565,200K+
7453,200K+

==> 500K+ <==
user_id,rep
885,500K+

==> <100K <==
user_id,rep
10762,<100K
70524,<100K






share|improve this answer














share|improve this answer



share|improve this answer








edited yesterday









regex

223




223










answered yesterday









murumuru

37.6k589165




37.6k589165












  • I'm having some issues with this due to commas inside text identifiers (" "). Is there an easy fix?

    – regex
    yesterday











  • Not with awk. You should use a tool with support for quoted csv, like csvkit or Python or Perl.

    – muru
    yesterday

















  • I'm having some issues with this due to commas inside text identifiers (" "). Is there an easy fix?

    – regex
    yesterday











  • Not with awk. You should use a tool with support for quoted csv, like csvkit or Python or Perl.

    – muru
    yesterday
















I'm having some issues with this due to commas inside text identifiers (" "). Is there an easy fix?

– regex
yesterday





I'm having some issues with this due to commas inside text identifiers (" "). Is there an easy fix?

– regex
yesterday













Not with awk. You should use a tool with support for quoted csv, like csvkit or Python or Perl.

– muru
yesterday





Not with awk. You should use a tool with support for quoted csv, like csvkit or Python or Perl.

– muru
yesterday










regex is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















regex is a new contributor. Be nice, and check out our Code of Conduct.












regex is a new contributor. Be nice, and check out our Code of Conduct.











regex is a new contributor. Be nice, and check out our Code of Conduct.














Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f511886%2fextracting-lines-to-new-files%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







-awk, pattern-matching, text-processing

Popular posts from this blog

Frič See also Navigation menuinternal link

Identify plant with long narrow paired leaves and reddish stems Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?What is this plant with long sharp leaves? Is it a weed?What is this 3ft high, stalky plant, with mid sized narrow leaves?What is this young shrub with opposite ovate, crenate leaves and reddish stems?What is this plant with large broad serrated leaves?Identify this upright branching weed with long leaves and reddish stemsPlease help me identify this bulbous plant with long, broad leaves and white flowersWhat is this small annual with narrow gray/green leaves and rust colored daisy-type flowers?What is this chilli plant?Does anyone know what type of chilli plant this is?Help identify this plant

fontconfig warning: “/etc/fonts/fonts.conf”, line 100: unknown “element blank” The 2019 Stack Overflow Developer Survey Results Are In“tar: unrecognized option --warning” during 'apt-get install'How to fix Fontconfig errorHow do I figure out which font file is chosen for a system generic font alias?Why are some apt-get-installed fonts being ignored by fc-list, xfontsel, etc?Reload settings in /etc/fonts/conf.dTaking 30 seconds longer to boot after upgrade from jessie to stretchHow to match multiple font names with a single <match> element?Adding a custom font to fontconfigRemoving fonts from fontconfig <match> resultsBroken fonts after upgrading Firefox ESR to latest Firefox