Merge multiple DataFrames Pandas The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experiencePandas Merging 101MemoryError when I merge two Pandas data framesMerge multiple DataFramesHow to merge two dictionaries in a single expression?How to sort a dataframe by multiple column(s)Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

How did passengers keep warm on sail ships?

Sort a list of pairs representing an acyclic, partial automorphism

What aspect of planet Earth must be changed to prevent the industrial revolution?

Can undead you have reanimated wait inside a portable hole?

How to pronounce 1ターン?

Change bounding box of math glyphs in LuaTeX

How does this infinite series simplify to an integral?

Problems with Ubuntu mount /tmp

Is above average number of years spent on PhD considered a red flag in future academia or industry positions?

Derivation tree not rendering

How to remove this toilet supply line that seems to have no nut?

Is every episode of "Where are my Pants?" identical?

Finding the path in a graph from A to B then back to A with a minimum of shared edges

How to test the equality of two Pearson correlation coefficients computed from the same sample?

Didn't get enough time to take a Coding Test - what to do now?

How should I replace vector<uint8_t>::const_iterator in an API?

Can the prologue be the backstory of your main character?

"... to apply for a visa" or "... and applied for a visa"?

Make it rain characters

Create an outline of font

What do you call a plan that's an alternative plan in case your initial plan fails?

Are spiders unable to hurt humans, especially very small spiders?

Can a novice safely splice in wire to lengthen 5V charging cable?

How can I protect witches in combat who wear limited clothing?

Merge multiple DataFrames Pandas

The 2019 Stack Overflow Developer Survey Results Are In

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

The Ask Question Wizard is Live!

Data science time! April 2019 and salary with experiencePandas Merging 101MemoryError when I merge two Pandas data framesMerge multiple DataFramesHow to merge two dictionaries in a single expression?How to sort a dataframe by multiple column(s)Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

This might be considered as a duplicate of a thorough explanation of various approaches, however I can't seem to find a solution to my problem there due to a higher number of Data Frames.

I have multiple Data Frames (more than 10), each differing in one column VARX. This is just a quick and oversimplified example:

import pandas as pd

df1 = pd.DataFrame('depth': [0.500000, 0.600000, 1.300000],
 'VAR1': [38.196202, 38.198002, 38.200001],
 'profile': ['profile_1', 'profile_1','profile_1'])

df2 = pd.DataFrame('depth': [0.600000, 1.100000, 1.200000],
 'VAR2': [0.20440, 0.20442, 0.20446],
 'profile': ['profile_1', 'profile_1','profile_1'])

df3 = pd.DataFrame('depth': [1.200000, 1.300000, 1.400000],
 'VAR3': [15.1880, 15.1820, 15.1820],
 'profile': ['profile_1', 'profile_1','profile_1'])

Each df has same or different depths for the same profiles, so

I need to create a new DataFrame which would merge all separate ones, where the key columns for the operation are depth and profile, with all appearing depth values for each profile.

The VARX value should be therefore NaN where there is no depth measurement of that variable for that profile.

The result should be a thus a new, compressed DataFrame with all VARX as additional columns to the depth and profile ones, something like this:

name_profile depth VAR1 VAR2 VAR3
profile_1 0.500000 38.196202 NaN NaN
profile_1 0.600000 38.198002 0.20440 NaN
profile_1 1.100000 NaN 0.20442 NaN
profile_1 1.200000 NaN 0.20446 15.1880
profile_1 1.300000 38.200001 NaN 15.1820
profile_1 1.400000 NaN NaN 15.1820

Note that the actual number of profiles is much, much bigger.

Any ideas?

edited yesterday

asked yesterday

PEBKAC

316110

add a comment |

This might be considered as a duplicate of a thorough explanation of various approaches, however I can't seem to find a solution to my problem there due to a higher number of Data Frames.

I have multiple Data Frames (more than 10), each differing in one column VARX. This is just a quick and oversimplified example:

import pandas as pd

df1 = pd.DataFrame('depth': [0.500000, 0.600000, 1.300000],
 'VAR1': [38.196202, 38.198002, 38.200001],
 'profile': ['profile_1', 'profile_1','profile_1'])

df2 = pd.DataFrame('depth': [0.600000, 1.100000, 1.200000],
 'VAR2': [0.20440, 0.20442, 0.20446],
 'profile': ['profile_1', 'profile_1','profile_1'])

df3 = pd.DataFrame('depth': [1.200000, 1.300000, 1.400000],
 'VAR3': [15.1880, 15.1820, 15.1820],
 'profile': ['profile_1', 'profile_1','profile_1'])

Each df has same or different depths for the same profiles, so

I need to create a new DataFrame which would merge all separate ones, where the key columns for the operation are depth and profile, with all appearing depth values for each profile.

The VARX value should be therefore NaN where there is no depth measurement of that variable for that profile.

The result should be a thus a new, compressed DataFrame with all VARX as additional columns to the depth and profile ones, something like this:

name_profile depth VAR1 VAR2 VAR3
profile_1 0.500000 38.196202 NaN NaN
profile_1 0.600000 38.198002 0.20440 NaN
profile_1 1.100000 NaN 0.20442 NaN
profile_1 1.200000 NaN 0.20446 15.1880
profile_1 1.300000 38.200001 NaN 15.1820
profile_1 1.400000 NaN NaN 15.1820

Note that the actual number of profiles is much, much bigger.

Any ideas?

edited yesterday

asked yesterday

PEBKAC

316110

add a comment |

This might be considered as a duplicate of a thorough explanation of various approaches, however I can't seem to find a solution to my problem there due to a higher number of Data Frames.

I have multiple Data Frames (more than 10), each differing in one column VARX. This is just a quick and oversimplified example:

import pandas as pd

df1 = pd.DataFrame('depth': [0.500000, 0.600000, 1.300000],
 'VAR1': [38.196202, 38.198002, 38.200001],
 'profile': ['profile_1', 'profile_1','profile_1'])

df2 = pd.DataFrame('depth': [0.600000, 1.100000, 1.200000],
 'VAR2': [0.20440, 0.20442, 0.20446],
 'profile': ['profile_1', 'profile_1','profile_1'])

df3 = pd.DataFrame('depth': [1.200000, 1.300000, 1.400000],
 'VAR3': [15.1880, 15.1820, 15.1820],
 'profile': ['profile_1', 'profile_1','profile_1'])

Each df has same or different depths for the same profiles, so

I need to create a new DataFrame which would merge all separate ones, where the key columns for the operation are depth and profile, with all appearing depth values for each profile.

The VARX value should be therefore NaN where there is no depth measurement of that variable for that profile.

The result should be a thus a new, compressed DataFrame with all VARX as additional columns to the depth and profile ones, something like this:

name_profile depth VAR1 VAR2 VAR3
profile_1 0.500000 38.196202 NaN NaN
profile_1 0.600000 38.198002 0.20440 NaN
profile_1 1.100000 NaN 0.20442 NaN
profile_1 1.200000 NaN 0.20446 15.1880
profile_1 1.300000 38.200001 NaN 15.1820
profile_1 1.400000 NaN NaN 15.1820

Note that the actual number of profiles is much, much bigger.

Any ideas?

edited yesterday

asked yesterday

PEBKAC

316110

This might be considered as a duplicate of a thorough explanation of various approaches, however I can't seem to find a solution to my problem there due to a higher number of Data Frames.

I have multiple Data Frames (more than 10), each differing in one column VARX. This is just a quick and oversimplified example:

import pandas as pd

df1 = pd.DataFrame('depth': [0.500000, 0.600000, 1.300000],
 'VAR1': [38.196202, 38.198002, 38.200001],
 'profile': ['profile_1', 'profile_1','profile_1'])

df2 = pd.DataFrame('depth': [0.600000, 1.100000, 1.200000],
 'VAR2': [0.20440, 0.20442, 0.20446],
 'profile': ['profile_1', 'profile_1','profile_1'])

df3 = pd.DataFrame('depth': [1.200000, 1.300000, 1.400000],
 'VAR3': [15.1880, 15.1820, 15.1820],
 'profile': ['profile_1', 'profile_1','profile_1'])

Each df has same or different depths for the same profiles, so

I need to create a new DataFrame which would merge all separate ones, where the key columns for the operation are depth and profile, with all appearing depth values for each profile.

The VARX value should be therefore NaN where there is no depth measurement of that variable for that profile.

The result should be a thus a new, compressed DataFrame with all VARX as additional columns to the depth and profile ones, something like this:

name_profile depth VAR1 VAR2 VAR3
profile_1 0.500000 38.196202 NaN NaN
profile_1 0.600000 38.198002 0.20440 NaN
profile_1 1.100000 NaN 0.20442 NaN
profile_1 1.200000 NaN 0.20446 15.1880
profile_1 1.300000 38.200001 NaN 15.1820
profile_1 1.400000 NaN NaN 15.1820

Note that the actual number of profiles is much, much bigger.

Any ideas?

python pandas dataframe

edited yesterday

asked yesterday

PEBKAC

316110

edited yesterday

asked yesterday

PEBKAC

316110

edited yesterday

asked yesterday

PEBKAC

316110

asked yesterday

PEBKAC

316110

asked yesterday

PEBKAC

316110

add a comment |

5 Answers
5

active

oldest

votes

Consider setting index on each data frame and then run the horizontal merge with pd.concat:

dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

print(pd.concat(dfs, axis=1).reset_index())
# profile depth VAR1 VAR2 VAR3
# 0 profile_1 0.5 38.198002 NaN NaN
# 1 profile_1 0.6 38.198002 0.20440 NaN
# 2 profile_1 1.1 NaN 0.20442 NaN
# 3 profile_1 1.2 NaN 0.20446 15.188
# 4 profile_1 1.3 38.200001 NaN 15.182
# 5 profile_1 1.4 NaN NaN 15.182

answered yesterday

Parfait

54.3k104872

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
yesterday

1

Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

– Parfait
yesterday

1

You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

– Parfait
yesterday

1

I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

– Parfait
yesterday

1

You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

– Parfait
yesterday

|
show 8 more comments

Or using merge:

from functools import partial, reduce

dfs = [df1,df2,df3]
merge = partial(pd.merge, on=['depth','profile'], how='outer')
reduce(merge, dfs)

 depth VAR1 profile VAR2 VAR3
0 0.6 38.198002 profile_1 0.20440 NaN
1 0.6 38.198002 profile_1 0.20440 NaN
2 1.3 38.200001 profile_1 NaN 15.182
3 1.1 NaN profile_1 0.20442 NaN
4 1.2 NaN profile_1 0.20446 15.188
5 1.4 NaN profile_1 NaN 15.182

Update

For merging the dataframes in a loop as suggested in the comments, you could do something like:

df_final = pd.DataFrame(columns=df1.columns)
for df in dfs:
 df_final = df_final.merge(df, on=['depth','profile'], how='outer')

edited yesterday

answered yesterday

yatu

15.8k41642

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
yesterday

1

Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

– yatu
yesterday

thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

– PEBKAC
yesterday

1

Check the update @PEBKAC

– yatu
yesterday

1

Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

– yatu
yesterday

|
show 4 more comments

I would use append.

>>> df1.append(df2).append(df3).sort_values('depth')

 VAR1 VAR2 VAR3 depth profile
0 38.196202 NaN NaN 0.5 profile_1
1 38.198002 NaN NaN 0.6 profile_1
0 NaN 0.20440 NaN 0.6 profile_1
1 NaN 0.20442 NaN 1.1 profile_1
2 NaN 0.20446 NaN 1.2 profile_1
0 NaN NaN 15.188 1.2 profile_1
2 38.200001 NaN NaN 1.3 profile_1
1 NaN NaN 15.182 1.3 profile_1
2 NaN NaN 15.182 1.4 profile_1

Obviously if you have a lot of dataframes, just make a list and loop through them.

edited yesterday

answered yesterday

BlivetWidget

3,7991922

thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

– PEBKAC
yesterday

1

@PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

– BlivetWidget
yesterday

thank you, most helpful!

– PEBKAC
yesterday

add a comment |

Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.

df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

Where df_pivot will be

variable VAR1 VAR2 VAR3
profile depth 
profile_1 0.5 38.196202 NaN NaN
 0.6 38.198002 0.20440 NaN
 1.1 NaN 0.20442 NaN
 1.2 NaN 0.20446 15.188
 1.3 38.200001 NaN 15.182
 1.4 NaN NaN 15.182

answered yesterday

SEpapoulis

463

add a comment |

You can also use:

dfs = [df1, df2, df3]
df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
for d in dfs[2:]:
 df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

 depth VAR1 profile VAR2 VAR3
0 0.5 38.196202 profile_1 NaN NaN
1 0.6 38.198002 profile_1 0.20440 NaN
2 1.3 38.200001 profile_1 NaN 15.182
3 1.1 NaN profile_1 0.20442 NaN
4 1.2 NaN profile_1 0.20446 15.188
5 1.4 NaN profile_1 NaN 15.182

answered yesterday

heena bawa

59645

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55652704%2fmerge-multiple-dataframes-pandas%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

Consider setting index on each data frame and then run the horizontal merge with pd.concat:

dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

print(pd.concat(dfs, axis=1).reset_index())
# profile depth VAR1 VAR2 VAR3
# 0 profile_1 0.5 38.198002 NaN NaN
# 1 profile_1 0.6 38.198002 0.20440 NaN
# 2 profile_1 1.1 NaN 0.20442 NaN
# 3 profile_1 1.2 NaN 0.20446 15.188
# 4 profile_1 1.3 38.200001 NaN 15.182
# 5 profile_1 1.4 NaN NaN 15.182

answered yesterday

Parfait

54.3k104872

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
yesterday

1

Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

– Parfait
yesterday

1

You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

– Parfait
yesterday

1

I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

– Parfait
yesterday

1

You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

– Parfait
yesterday

|
show 8 more comments

Consider setting index on each data frame and then run the horizontal merge with pd.concat:

dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

print(pd.concat(dfs, axis=1).reset_index())
# profile depth VAR1 VAR2 VAR3
# 0 profile_1 0.5 38.198002 NaN NaN
# 1 profile_1 0.6 38.198002 0.20440 NaN
# 2 profile_1 1.1 NaN 0.20442 NaN
# 3 profile_1 1.2 NaN 0.20446 15.188
# 4 profile_1 1.3 38.200001 NaN 15.182
# 5 profile_1 1.4 NaN NaN 15.182

answered yesterday

Parfait

54.3k104872

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
yesterday

1

Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

– Parfait
yesterday

1

You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

– Parfait
yesterday

1

I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

– Parfait
yesterday

1

You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

– Parfait
yesterday

|
show 8 more comments

Consider setting index on each data frame and then run the horizontal merge with pd.concat:

dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

print(pd.concat(dfs, axis=1).reset_index())
# profile depth VAR1 VAR2 VAR3
# 0 profile_1 0.5 38.198002 NaN NaN
# 1 profile_1 0.6 38.198002 0.20440 NaN
# 2 profile_1 1.1 NaN 0.20442 NaN
# 3 profile_1 1.2 NaN 0.20446 15.188
# 4 profile_1 1.3 38.200001 NaN 15.182
# 5 profile_1 1.4 NaN NaN 15.182

answered yesterday

Parfait

54.3k104872

Consider setting index on each data frame and then run the horizontal merge with pd.concat:

dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

print(pd.concat(dfs, axis=1).reset_index())
# profile depth VAR1 VAR2 VAR3
# 0 profile_1 0.5 38.198002 NaN NaN
# 1 profile_1 0.6 38.198002 0.20440 NaN
# 2 profile_1 1.1 NaN 0.20442 NaN
# 3 profile_1 1.2 NaN 0.20446 15.188
# 4 profile_1 1.3 38.200001 NaN 15.182
# 5 profile_1 1.4 NaN NaN 15.182

answered yesterday

Parfait

54.3k104872

answered yesterday

Parfait

54.3k104872

answered yesterday

Parfait

54.3k104872

answered yesterday

Parfait

54.3k104872

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
yesterday

1

Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

– Parfait
yesterday

1

You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

– Parfait
yesterday

1

I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

– Parfait
yesterday

1

You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

– Parfait
yesterday

|
show 8 more comments

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
yesterday

1

Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

– Parfait
yesterday

1

You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

– Parfait
yesterday

1

I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

– Parfait
yesterday

1

You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

– Parfait
yesterday

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
yesterday

Ah, my mistake, do not bracket m which casts as list: dfs = [pd.read_csv(m, index_col=[0,1]) for m in myfiles]

– Parfait
yesterday

You have multiple rows with same profile AND depth. Originally you had that same issue in your post and I noticed you edited the first df's depth from 0.6 to 0.5. Try de-duping or aggregating before setting index and concatenating.

– Parfait
yesterday

I believe that is a different question and you already accepted a solution here (which come to think may result in a duplicate joins). Make an earnest effort and come back to SO with specific issues.

– Parfait
yesterday

You should close this one out as answers here does resolve your immediate question that even uses posted data. The data size and even data content with dups is a different question.

– Parfait
yesterday

|
show 8 more comments

Or using merge:

from functools import partial, reduce

dfs = [df1,df2,df3]
merge = partial(pd.merge, on=['depth','profile'], how='outer')
reduce(merge, dfs)

 depth VAR1 profile VAR2 VAR3
0 0.6 38.198002 profile_1 0.20440 NaN
1 0.6 38.198002 profile_1 0.20440 NaN
2 1.3 38.200001 profile_1 NaN 15.182
3 1.1 NaN profile_1 0.20442 NaN
4 1.2 NaN profile_1 0.20446 15.188
5 1.4 NaN profile_1 NaN 15.182

Update

For merging the dataframes in a loop as suggested in the comments, you could do something like:

df_final = pd.DataFrame(columns=df1.columns)
for df in dfs:
 df_final = df_final.merge(df, on=['depth','profile'], how='outer')

edited yesterday

answered yesterday

yatu

15.8k41642

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
yesterday

1

Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

– yatu
yesterday

thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

– PEBKAC
yesterday

1

Check the update @PEBKAC

– yatu
yesterday

1

Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

– yatu
yesterday

|
show 4 more comments

Or using merge:

from functools import partial, reduce

dfs = [df1,df2,df3]
merge = partial(pd.merge, on=['depth','profile'], how='outer')
reduce(merge, dfs)

 depth VAR1 profile VAR2 VAR3
0 0.6 38.198002 profile_1 0.20440 NaN
1 0.6 38.198002 profile_1 0.20440 NaN
2 1.3 38.200001 profile_1 NaN 15.182
3 1.1 NaN profile_1 0.20442 NaN
4 1.2 NaN profile_1 0.20446 15.188
5 1.4 NaN profile_1 NaN 15.182

Update

For merging the dataframes in a loop as suggested in the comments, you could do something like:

df_final = pd.DataFrame(columns=df1.columns)
for df in dfs:
 df_final = df_final.merge(df, on=['depth','profile'], how='outer')

edited yesterday

answered yesterday

yatu

15.8k41642

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
yesterday

1

Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

– yatu
yesterday

thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

– PEBKAC
yesterday

1

Check the update @PEBKAC

– yatu
yesterday

1

Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

– yatu
yesterday

|
show 4 more comments

Or using merge:

from functools import partial, reduce

dfs = [df1,df2,df3]
merge = partial(pd.merge, on=['depth','profile'], how='outer')
reduce(merge, dfs)

 depth VAR1 profile VAR2 VAR3
0 0.6 38.198002 profile_1 0.20440 NaN
1 0.6 38.198002 profile_1 0.20440 NaN
2 1.3 38.200001 profile_1 NaN 15.182
3 1.1 NaN profile_1 0.20442 NaN
4 1.2 NaN profile_1 0.20446 15.188
5 1.4 NaN profile_1 NaN 15.182

Update

For merging the dataframes in a loop as suggested in the comments, you could do something like:

df_final = pd.DataFrame(columns=df1.columns)
for df in dfs:
 df_final = df_final.merge(df, on=['depth','profile'], how='outer')

edited yesterday

answered yesterday

yatu

15.8k41642

Or using merge:

from functools import partial, reduce

dfs = [df1,df2,df3]
merge = partial(pd.merge, on=['depth','profile'], how='outer')
reduce(merge, dfs)

 depth VAR1 profile VAR2 VAR3
0 0.6 38.198002 profile_1 0.20440 NaN
1 0.6 38.198002 profile_1 0.20440 NaN
2 1.3 38.200001 profile_1 NaN 15.182
3 1.1 NaN profile_1 0.20442 NaN
4 1.2 NaN profile_1 0.20446 15.188
5 1.4 NaN profile_1 NaN 15.182

Update

For merging the dataframes in a loop as suggested in the comments, you could do something like:

df_final = pd.DataFrame(columns=df1.columns)
for df in dfs:
 df_final = df_final.merge(df, on=['depth','profile'], how='outer')

edited yesterday

answered yesterday

yatu

15.8k41642

edited yesterday

answered yesterday

yatu

15.8k41642

answered yesterday

yatu

15.8k41642

answered yesterday

yatu

15.8k41642

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
yesterday

1

Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

– yatu
yesterday

thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

– PEBKAC
yesterday

1

Check the update @PEBKAC

– yatu
yesterday

1

Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

– yatu
yesterday

|
show 4 more comments

that's awesome, thank you! How would you do it within a loop, for example: for m in range(len(myfiles)): (where I read separate files for each df) df = pd.read_csv(myfiles[m])

– PEBKAC
yesterday

1

Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

– yatu
yesterday

thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

– PEBKAC
yesterday

1

Check the update @PEBKAC

– yatu
yesterday

1

Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

– yatu
yesterday

Well the main purpose of reduce here is to avoid a loop. If you prefer that approach I assume for memory constraints, you need a single merge on each iteration. Simply update the resulting dataframe on each loop

– yatu
yesterday

thank you, that's super helpful, but would you perhaps care to show how such an iteration would look like, perhaps just here as a comment? I'm not really sure how to continue

– PEBKAC
yesterday

Check the update @PEBKAC

– yatu
yesterday

Well if you have to end up merging them all, you likely won't be able to obtain the final dataframe anyway. I'd suggest you to work with chunks of data. Check stackoverflow.com/questions/47386405/…

– yatu
yesterday

|
show 4 more comments

I would use append.

>>> df1.append(df2).append(df3).sort_values('depth')

 VAR1 VAR2 VAR3 depth profile
0 38.196202 NaN NaN 0.5 profile_1
1 38.198002 NaN NaN 0.6 profile_1
0 NaN 0.20440 NaN 0.6 profile_1
1 NaN 0.20442 NaN 1.1 profile_1
2 NaN 0.20446 NaN 1.2 profile_1
0 NaN NaN 15.188 1.2 profile_1
2 38.200001 NaN NaN 1.3 profile_1
1 NaN NaN 15.182 1.3 profile_1
2 NaN NaN 15.182 1.4 profile_1

Obviously if you have a lot of dataframes, just make a list and loop through them.

edited yesterday

answered yesterday

BlivetWidget

3,7991922

thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

– PEBKAC
yesterday

1

@PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

– BlivetWidget
yesterday

thank you, most helpful!

– PEBKAC
yesterday

add a comment |

I would use append.

>>> df1.append(df2).append(df3).sort_values('depth')

 VAR1 VAR2 VAR3 depth profile
0 38.196202 NaN NaN 0.5 profile_1
1 38.198002 NaN NaN 0.6 profile_1
0 NaN 0.20440 NaN 0.6 profile_1
1 NaN 0.20442 NaN 1.1 profile_1
2 NaN 0.20446 NaN 1.2 profile_1
0 NaN NaN 15.188 1.2 profile_1
2 38.200001 NaN NaN 1.3 profile_1
1 NaN NaN 15.182 1.3 profile_1
2 NaN NaN 15.182 1.4 profile_1

Obviously if you have a lot of dataframes, just make a list and loop through them.

edited yesterday

answered yesterday

BlivetWidget

3,7991922

thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

– PEBKAC
yesterday

1

@PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

– BlivetWidget
yesterday

thank you, most helpful!

– PEBKAC
yesterday

add a comment |

I would use append.

>>> df1.append(df2).append(df3).sort_values('depth')

 VAR1 VAR2 VAR3 depth profile
0 38.196202 NaN NaN 0.5 profile_1
1 38.198002 NaN NaN 0.6 profile_1
0 NaN 0.20440 NaN 0.6 profile_1
1 NaN 0.20442 NaN 1.1 profile_1
2 NaN 0.20446 NaN 1.2 profile_1
0 NaN NaN 15.188 1.2 profile_1
2 38.200001 NaN NaN 1.3 profile_1
1 NaN NaN 15.182 1.3 profile_1
2 NaN NaN 15.182 1.4 profile_1

Obviously if you have a lot of dataframes, just make a list and loop through them.

edited yesterday

answered yesterday

BlivetWidget

3,7991922

I would use append.

>>> df1.append(df2).append(df3).sort_values('depth')

 VAR1 VAR2 VAR3 depth profile
0 38.196202 NaN NaN 0.5 profile_1
1 38.198002 NaN NaN 0.6 profile_1
0 NaN 0.20440 NaN 0.6 profile_1
1 NaN 0.20442 NaN 1.1 profile_1
2 NaN 0.20446 NaN 1.2 profile_1
0 NaN NaN 15.188 1.2 profile_1
2 38.200001 NaN NaN 1.3 profile_1
1 NaN NaN 15.182 1.3 profile_1
2 NaN NaN 15.182 1.4 profile_1

Obviously if you have a lot of dataframes, just make a list and loop through them.

edited yesterday

answered yesterday

BlivetWidget

3,7991922

edited yesterday

answered yesterday

BlivetWidget

3,7991922

answered yesterday

BlivetWidget

3,7991922

answered yesterday

BlivetWidget

3,7991922

thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

– PEBKAC
yesterday

1

@PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

– BlivetWidget
yesterday

thank you, most helpful!

– PEBKAC
yesterday

add a comment |

thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

– PEBKAC
yesterday

1

@PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

– BlivetWidget
yesterday

thank you, most helpful!

– PEBKAC
yesterday

thank you! @BlivetWidget, how do you sort it both by depth AND profile? each profile has a set of depths and each dataframe has a bunch of profiles?

– PEBKAC
yesterday

@PEBKAC you can sort it by however many parameters you want, in whatever order you want. .sort_values(['depth', 'profile']) or .sort_values(['profile', 'depth']). You can check the help on df1.sort_values to learn how to change the sort order, to sort in place, and various other optional parameters.

– BlivetWidget
yesterday

thank you, most helpful!

– PEBKAC
yesterday

add a comment |

Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.

df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

Where df_pivot will be

variable VAR1 VAR2 VAR3
profile depth 
profile_1 0.5 38.196202 NaN NaN
 0.6 38.198002 0.20440 NaN
 1.1 NaN 0.20442 NaN
 1.2 NaN 0.20446 15.188
 1.3 38.200001 NaN 15.182
 1.4 NaN NaN 15.182

answered yesterday

SEpapoulis

463

add a comment |

Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.

df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

Where df_pivot will be

variable VAR1 VAR2 VAR3
profile depth 
profile_1 0.5 38.196202 NaN NaN
 0.6 38.198002 0.20440 NaN
 1.1 NaN 0.20442 NaN
 1.2 NaN 0.20446 15.188
 1.3 38.200001 NaN 15.182
 1.4 NaN NaN 15.182

answered yesterday

SEpapoulis

463

add a comment |

Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.

df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

Where df_pivot will be

variable VAR1 VAR2 VAR3
profile depth 
profile_1 0.5 38.196202 NaN NaN
 0.6 38.198002 0.20440 NaN
 1.1 NaN 0.20442 NaN
 1.2 NaN 0.20446 15.188
 1.3 38.200001 NaN 15.182
 1.4 NaN NaN 15.182

answered yesterday

SEpapoulis

463

Why not concatenate all the Data Frames, melt, then reform them using your ids? There might be a more efficient way to do this, but this works.

df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

Where df_pivot will be

variable VAR1 VAR2 VAR3
profile depth 
profile_1 0.5 38.196202 NaN NaN
 0.6 38.198002 0.20440 NaN
 1.1 NaN 0.20442 NaN
 1.2 NaN 0.20446 15.188
 1.3 38.200001 NaN 15.182
 1.4 NaN NaN 15.182

answered yesterday

SEpapoulis

463

answered yesterday

SEpapoulis

463

answered yesterday

SEpapoulis

463

answered yesterday

SEpapoulis

463

add a comment |

You can also use:

dfs = [df1, df2, df3]
df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
for d in dfs[2:]:
 df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

 depth VAR1 profile VAR2 VAR3
0 0.5 38.196202 profile_1 NaN NaN
1 0.6 38.198002 profile_1 0.20440 NaN
2 1.3 38.200001 profile_1 NaN 15.182
3 1.1 NaN profile_1 0.20442 NaN
4 1.2 NaN profile_1 0.20446 15.188
5 1.4 NaN profile_1 NaN 15.182

answered yesterday

heena bawa

59645

add a comment |

You can also use:

dfs = [df1, df2, df3]
df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
for d in dfs[2:]:
 df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

 depth VAR1 profile VAR2 VAR3
0 0.5 38.196202 profile_1 NaN NaN
1 0.6 38.198002 profile_1 0.20440 NaN
2 1.3 38.200001 profile_1 NaN 15.182
3 1.1 NaN profile_1 0.20442 NaN
4 1.2 NaN profile_1 0.20446 15.188
5 1.4 NaN profile_1 NaN 15.182

answered yesterday

heena bawa

59645

add a comment |

You can also use:

dfs = [df1, df2, df3]
df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
for d in dfs[2:]:
 df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

 depth VAR1 profile VAR2 VAR3
0 0.5 38.196202 profile_1 NaN NaN
1 0.6 38.198002 profile_1 0.20440 NaN
2 1.3 38.200001 profile_1 NaN 15.182
3 1.1 NaN profile_1 0.20442 NaN
4 1.2 NaN profile_1 0.20446 15.188
5 1.4 NaN profile_1 NaN 15.182

answered yesterday

heena bawa

59645

You can also use:

dfs = [df1, df2, df3]
df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
for d in dfs[2:]:
 df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

 depth VAR1 profile VAR2 VAR3
0 0.5 38.196202 profile_1 NaN NaN
1 0.6 38.198002 profile_1 0.20440 NaN
2 1.3 38.200001 profile_1 NaN 15.182
3 1.1 NaN profile_1 0.20442 NaN
4 1.2 NaN profile_1 0.20446 15.188
5 1.4 NaN profile_1 NaN 15.182

answered yesterday

heena bawa

59645

answered yesterday

heena bawa

59645

answered yesterday

heena bawa

59645

answered yesterday

heena bawa

59645

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

-dataframe, pandas, pythony52th ph62 5iIa2eyku3F4,TJ

搜尋此網誌

Ttyjfyk

5 Answers
5

Your Answer

Post as a guest

5 Answers
5

5 Answers
5

Post as a guest

Popular posts from this blog

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

5 Answers 5

5 Answers 5

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

5 Answers
5

5 Answers
5

5 Answers
5