Flag only first row where condition is met in a DataFrameAdd one row to pandas DataFrameFilter dataframe rows if value in column is in a set list of valuesUse a list of values to select rows from a pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a Pandas dataframe?Selecting a row of pandas series/dataframe by integer indexHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasDeleting DataFrame row in Pandas based on column valuer

Flag only first row where condition is met in a DataFrameAdd one row to pandas DataFrameFilter dataframe rows if value in column is in a set list of valuesUse a list of values to select rows from a pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a Pandas dataframe?Selecting a row of pandas series/dataframe by integer indexHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasDeleting DataFrame row in Pandas based on column valuer - reorder certain rows if condition is met

How do you conduct xenoanthropology after first contact?

I probably found a bug with the sudo apt install function

Why did the Germans forbid the possession of pet pigeons in Rostov-on-Don in 1941?

A function which translates a sentence to title-case

Copenhagen passport control - US citizen

Is it possible to do 50 km distance without any previous training?

Simulate Bitwise Cyclic Tag

How can the DM most effectively choose 1 out of an odd number of players to be targeted by an attack or effect?

"which" command doesn't work / path of Safari?

How do we improve the relationship with a client software team that performs poorly and is becoming less collaborative?

Possibly bubble sort algorithm

A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?

How can bays and straits be determined in a procedurally generated map?

Why CLRS example on residual networks does not follows its formula?

Why is this code 6.5x slower with optimizations enabled?

What are these boxed doors outside store fronts in New York?

Should I join office cleaning event for free?

I see my dog run

Why Is Death Allowed In the Matrix?

Why are 150k or 200k jobs considered good when there are 300k+ births a month?

Copycat chess is back

What defenses are there against being summoned by the Gate spell?

Draw simple lines in Inkscape

How do I create uniquely male characters?

Flag only first row where condition is met in a DataFrame

Add one row to pandas DataFrameFilter dataframe rows if value in column is in a set list of valuesUse a list of values to select rows from a pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a Pandas dataframe?Selecting a row of pandas series/dataframe by integer indexHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasDeleting DataFrame row in Pandas based on column valuer - reorder certain rows if condition is met

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I have the following DataFrame df, which can be created as follows:

date_today = datetime.now().date()
days = pd.date_range(date_today, date_today + timedelta(19), freq='D')
x = np.arange(0,2*np.pi,0.1*np.pi) # start,stop,step
y = np.sin(x)
df = pd.DataFrame('dates': days, 'vals': y, 'is_hit': abs(y)>0.9)
df = df.set_index('dates')

And which looks like this:

 is_hit vals
dates 
2019-03-27 False 0.000000e+00
2019-03-28 False 3.090170e-01
2019-03-29 False 5.877853e-01
2019-03-30 False 8.090170e-01
2019-03-31 True 9.510565e-01
2019-04-01 True 1.000000e+00
2019-04-02 True 9.510565e-01
2019-04-03 False 8.090170e-01
2019-04-04 False 5.877853e-01
2019-04-05 False 3.090170e-01
2019-04-06 False 1.224647e-16
2019-04-07 False -3.090170e-01
2019-04-08 False -5.877853e-01
2019-04-09 False -8.090170e-01
2019-04-10 True -9.510565e-01
2019-04-11 True -1.000000e+00
2019-04-12 True -9.510565e-01
2019-04-13 False -8.090170e-01
2019-04-14 False -5.877853e-01
2019-04-15 False -3.090170e-01

I want to flag the rows where the is_hit condition is True for the first time, such that the expected new column hit_first would be:

 is_hit vals hit_first
dates 
2019-03-27 False 0.000000e+00 False
2019-03-28 False 3.090170e-01 False
2019-03-29 False 5.877853e-01 False
2019-03-30 False 8.090170e-01 False
2019-03-31 True 9.510565e-01 True
2019-04-01 True 1.000000e+00 False
2019-04-02 True 9.510565e-01 False
2019-04-03 False 8.090170e-01 False
2019-04-04 False 5.877853e-01 False
2019-04-05 False 3.090170e-01 False
2019-04-06 False 1.224647e-16 False
2019-04-07 False -3.090170e-01 False
2019-04-08 False -5.877853e-01 False
2019-04-09 False -8.090170e-01 False
2019-04-10 True -9.510565e-01 True
2019-04-11 True -1.000000e+00 False
2019-04-12 True -9.510565e-01 False
2019-04-13 False -8.090170e-01 False
2019-04-14 False -5.877853e-01 False
2019-04-15 False -3.090170e-01 False

asked Mar 27 at 12:23

JejeBelfort

6911624

add a comment |

I have the following DataFrame df, which can be created as follows:

date_today = datetime.now().date()
days = pd.date_range(date_today, date_today + timedelta(19), freq='D')
x = np.arange(0,2*np.pi,0.1*np.pi) # start,stop,step
y = np.sin(x)
df = pd.DataFrame('dates': days, 'vals': y, 'is_hit': abs(y)>0.9)
df = df.set_index('dates')

And which looks like this:

 is_hit vals
dates 
2019-03-27 False 0.000000e+00
2019-03-28 False 3.090170e-01
2019-03-29 False 5.877853e-01
2019-03-30 False 8.090170e-01
2019-03-31 True 9.510565e-01
2019-04-01 True 1.000000e+00
2019-04-02 True 9.510565e-01
2019-04-03 False 8.090170e-01
2019-04-04 False 5.877853e-01
2019-04-05 False 3.090170e-01
2019-04-06 False 1.224647e-16
2019-04-07 False -3.090170e-01
2019-04-08 False -5.877853e-01
2019-04-09 False -8.090170e-01
2019-04-10 True -9.510565e-01
2019-04-11 True -1.000000e+00
2019-04-12 True -9.510565e-01
2019-04-13 False -8.090170e-01
2019-04-14 False -5.877853e-01
2019-04-15 False -3.090170e-01

I want to flag the rows where the is_hit condition is True for the first time, such that the expected new column hit_first would be:

 is_hit vals hit_first
dates 
2019-03-27 False 0.000000e+00 False
2019-03-28 False 3.090170e-01 False
2019-03-29 False 5.877853e-01 False
2019-03-30 False 8.090170e-01 False
2019-03-31 True 9.510565e-01 True
2019-04-01 True 1.000000e+00 False
2019-04-02 True 9.510565e-01 False
2019-04-03 False 8.090170e-01 False
2019-04-04 False 5.877853e-01 False
2019-04-05 False 3.090170e-01 False
2019-04-06 False 1.224647e-16 False
2019-04-07 False -3.090170e-01 False
2019-04-08 False -5.877853e-01 False
2019-04-09 False -8.090170e-01 False
2019-04-10 True -9.510565e-01 True
2019-04-11 True -1.000000e+00 False
2019-04-12 True -9.510565e-01 False
2019-04-13 False -8.090170e-01 False
2019-04-14 False -5.877853e-01 False
2019-04-15 False -3.090170e-01 False

asked Mar 27 at 12:23

JejeBelfort

6911624

add a comment |

I have the following DataFrame df, which can be created as follows:

date_today = datetime.now().date()
days = pd.date_range(date_today, date_today + timedelta(19), freq='D')
x = np.arange(0,2*np.pi,0.1*np.pi) # start,stop,step
y = np.sin(x)
df = pd.DataFrame('dates': days, 'vals': y, 'is_hit': abs(y)>0.9)
df = df.set_index('dates')

And which looks like this:

 is_hit vals
dates 
2019-03-27 False 0.000000e+00
2019-03-28 False 3.090170e-01
2019-03-29 False 5.877853e-01
2019-03-30 False 8.090170e-01
2019-03-31 True 9.510565e-01
2019-04-01 True 1.000000e+00
2019-04-02 True 9.510565e-01
2019-04-03 False 8.090170e-01
2019-04-04 False 5.877853e-01
2019-04-05 False 3.090170e-01
2019-04-06 False 1.224647e-16
2019-04-07 False -3.090170e-01
2019-04-08 False -5.877853e-01
2019-04-09 False -8.090170e-01
2019-04-10 True -9.510565e-01
2019-04-11 True -1.000000e+00
2019-04-12 True -9.510565e-01
2019-04-13 False -8.090170e-01
2019-04-14 False -5.877853e-01
2019-04-15 False -3.090170e-01

I want to flag the rows where the is_hit condition is True for the first time, such that the expected new column hit_first would be:

 is_hit vals hit_first
dates 
2019-03-27 False 0.000000e+00 False
2019-03-28 False 3.090170e-01 False
2019-03-29 False 5.877853e-01 False
2019-03-30 False 8.090170e-01 False
2019-03-31 True 9.510565e-01 True
2019-04-01 True 1.000000e+00 False
2019-04-02 True 9.510565e-01 False
2019-04-03 False 8.090170e-01 False
2019-04-04 False 5.877853e-01 False
2019-04-05 False 3.090170e-01 False
2019-04-06 False 1.224647e-16 False
2019-04-07 False -3.090170e-01 False
2019-04-08 False -5.877853e-01 False
2019-04-09 False -8.090170e-01 False
2019-04-10 True -9.510565e-01 True
2019-04-11 True -1.000000e+00 False
2019-04-12 True -9.510565e-01 False
2019-04-13 False -8.090170e-01 False
2019-04-14 False -5.877853e-01 False
2019-04-15 False -3.090170e-01 False

asked Mar 27 at 12:23

JejeBelfort

6911624

I have the following DataFrame df, which can be created as follows:

date_today = datetime.now().date()
days = pd.date_range(date_today, date_today + timedelta(19), freq='D')
x = np.arange(0,2*np.pi,0.1*np.pi) # start,stop,step
y = np.sin(x)
df = pd.DataFrame('dates': days, 'vals': y, 'is_hit': abs(y)>0.9)
df = df.set_index('dates')

And which looks like this:

 is_hit vals
dates 
2019-03-27 False 0.000000e+00
2019-03-28 False 3.090170e-01
2019-03-29 False 5.877853e-01
2019-03-30 False 8.090170e-01
2019-03-31 True 9.510565e-01
2019-04-01 True 1.000000e+00
2019-04-02 True 9.510565e-01
2019-04-03 False 8.090170e-01
2019-04-04 False 5.877853e-01
2019-04-05 False 3.090170e-01
2019-04-06 False 1.224647e-16
2019-04-07 False -3.090170e-01
2019-04-08 False -5.877853e-01
2019-04-09 False -8.090170e-01
2019-04-10 True -9.510565e-01
2019-04-11 True -1.000000e+00
2019-04-12 True -9.510565e-01
2019-04-13 False -8.090170e-01
2019-04-14 False -5.877853e-01
2019-04-15 False -3.090170e-01

I want to flag the rows where the is_hit condition is True for the first time, such that the expected new column hit_first would be:

 is_hit vals hit_first
dates 
2019-03-27 False 0.000000e+00 False
2019-03-28 False 3.090170e-01 False
2019-03-29 False 5.877853e-01 False
2019-03-30 False 8.090170e-01 False
2019-03-31 True 9.510565e-01 True
2019-04-01 True 1.000000e+00 False
2019-04-02 True 9.510565e-01 False
2019-04-03 False 8.090170e-01 False
2019-04-04 False 5.877853e-01 False
2019-04-05 False 3.090170e-01 False
2019-04-06 False 1.224647e-16 False
2019-04-07 False -3.090170e-01 False
2019-04-08 False -5.877853e-01 False
2019-04-09 False -8.090170e-01 False
2019-04-10 True -9.510565e-01 True
2019-04-11 True -1.000000e+00 False
2019-04-12 True -9.510565e-01 False
2019-04-13 False -8.090170e-01 False
2019-04-14 False -5.877853e-01 False
2019-04-15 False -3.090170e-01 False

python pandas dataframe

asked Mar 27 at 12:23

JejeBelfort

6911624

asked Mar 27 at 12:23

JejeBelfort

6911624

asked Mar 27 at 12:23

JejeBelfort

6911624

asked Mar 27 at 12:23

JejeBelfort

6911624

asked Mar 27 at 12:23

JejeBelfort

6911624

add a comment |

4 Answers
4

active

oldest

votes

My suggestion:

df['hit_first'] = df['is_hit'] & (~df['is_hit']).shift(1)

answered Mar 27 at 12:28

ecortazar

96618

add a comment |

Use Series.shift chained with & for bitwise AND:

df['hit_first'] = df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
print (df)
 vals is_hit hit_first
dates 
2019-03-27 0.000000e+00 False False
2019-03-28 3.090170e-01 False False
2019-03-29 5.877853e-01 False False
2019-03-30 8.090170e-01 False False
2019-03-31 9.510565e-01 True True
2019-04-01 1.000000e+00 True False
2019-04-02 9.510565e-01 True False
2019-04-03 8.090170e-01 False False
2019-04-04 5.877853e-01 False False
2019-04-05 3.090170e-01 False False
2019-04-06 1.224647e-16 False False
2019-04-07 -3.090170e-01 False False
2019-04-08 -5.877853e-01 False False
2019-04-09 -8.090170e-01 False False
2019-04-10 -9.510565e-01 True True
2019-04-11 -1.000000e+00 True False
2019-04-12 -9.510565e-01 True False
2019-04-13 -8.090170e-01 False False
2019-04-14 -5.877853e-01 False False
2019-04-15 -3.090170e-01 False False

edited Mar 27 at 12:33

answered Mar 27 at 12:28

jezrael

356k26320396

add a comment |

I also, think you can do it this way:

df['is_hit'].astype(int).diff() == 1

Output:

dates
2019-03-27 False
2019-03-28 False
2019-03-29 False
2019-03-30 False
2019-03-31 True
2019-04-01 False
2019-04-02 False
2019-04-03 False
2019-04-04 False
2019-04-05 False
2019-04-06 False
2019-04-07 False
2019-04-08 False
2019-04-09 False
2019-04-10 True
2019-04-11 False
2019-04-12 False
2019-04-13 False
2019-04-14 False
2019-04-15 False
Name: is_hit, dtype: bool

Timings:

%timeit df['is_hit'] & (~df['is_hit']).shift(1)
1.13 ms ± 5.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
908 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].astype(int).diff() == 1
689 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited Mar 27 at 12:57

answered Mar 27 at 12:50

Scott Boston

58k73258

2

Nice, maybe performance in large data should be interesting.

– jezrael
Mar 27 at 13:11

add a comment |

-1

Also this can be done by using simple difference between the series and it's shifted series by 1 period :

df['hit_first'] = df['is_hit']-df['is_hit'].shift()==1

edited Mar 27 at 22:45

answered Mar 27 at 12:54

Loochie

959310

1

The use of np.where here is quite pointless.

– miradulo
Mar 27 at 18:50

Yes I understood. Thanks :)

– Loochie
Mar 27 at 20:42

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

– DebanjanB
Mar 27 at 21:22

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55377130%2fflag-only-first-row-where-condition-is-met-in-a-dataframe%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

My suggestion:

df['hit_first'] = df['is_hit'] & (~df['is_hit']).shift(1)

answered Mar 27 at 12:28

ecortazar

96618

add a comment |

My suggestion:

df['hit_first'] = df['is_hit'] & (~df['is_hit']).shift(1)

answered Mar 27 at 12:28

ecortazar

96618

add a comment |

My suggestion:

df['hit_first'] = df['is_hit'] & (~df['is_hit']).shift(1)

answered Mar 27 at 12:28

ecortazar

96618

My suggestion:

df['hit_first'] = df['is_hit'] & (~df['is_hit']).shift(1)

answered Mar 27 at 12:28

ecortazar

96618

answered Mar 27 at 12:28

ecortazar

96618

answered Mar 27 at 12:28

ecortazar

96618

answered Mar 27 at 12:28

ecortazar

96618

add a comment |

Use Series.shift chained with & for bitwise AND:

df['hit_first'] = df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
print (df)
 vals is_hit hit_first
dates 
2019-03-27 0.000000e+00 False False
2019-03-28 3.090170e-01 False False
2019-03-29 5.877853e-01 False False
2019-03-30 8.090170e-01 False False
2019-03-31 9.510565e-01 True True
2019-04-01 1.000000e+00 True False
2019-04-02 9.510565e-01 True False
2019-04-03 8.090170e-01 False False
2019-04-04 5.877853e-01 False False
2019-04-05 3.090170e-01 False False
2019-04-06 1.224647e-16 False False
2019-04-07 -3.090170e-01 False False
2019-04-08 -5.877853e-01 False False
2019-04-09 -8.090170e-01 False False
2019-04-10 -9.510565e-01 True True
2019-04-11 -1.000000e+00 True False
2019-04-12 -9.510565e-01 True False
2019-04-13 -8.090170e-01 False False
2019-04-14 -5.877853e-01 False False
2019-04-15 -3.090170e-01 False False

edited Mar 27 at 12:33

answered Mar 27 at 12:28

jezrael

356k26320396

add a comment |

Use Series.shift chained with & for bitwise AND:

df['hit_first'] = df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
print (df)
 vals is_hit hit_first
dates 
2019-03-27 0.000000e+00 False False
2019-03-28 3.090170e-01 False False
2019-03-29 5.877853e-01 False False
2019-03-30 8.090170e-01 False False
2019-03-31 9.510565e-01 True True
2019-04-01 1.000000e+00 True False
2019-04-02 9.510565e-01 True False
2019-04-03 8.090170e-01 False False
2019-04-04 5.877853e-01 False False
2019-04-05 3.090170e-01 False False
2019-04-06 1.224647e-16 False False
2019-04-07 -3.090170e-01 False False
2019-04-08 -5.877853e-01 False False
2019-04-09 -8.090170e-01 False False
2019-04-10 -9.510565e-01 True True
2019-04-11 -1.000000e+00 True False
2019-04-12 -9.510565e-01 True False
2019-04-13 -8.090170e-01 False False
2019-04-14 -5.877853e-01 False False
2019-04-15 -3.090170e-01 False False

edited Mar 27 at 12:33

answered Mar 27 at 12:28

jezrael

356k26320396

add a comment |

Use Series.shift chained with & for bitwise AND:

df['hit_first'] = df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
print (df)
 vals is_hit hit_first
dates 
2019-03-27 0.000000e+00 False False
2019-03-28 3.090170e-01 False False
2019-03-29 5.877853e-01 False False
2019-03-30 8.090170e-01 False False
2019-03-31 9.510565e-01 True True
2019-04-01 1.000000e+00 True False
2019-04-02 9.510565e-01 True False
2019-04-03 8.090170e-01 False False
2019-04-04 5.877853e-01 False False
2019-04-05 3.090170e-01 False False
2019-04-06 1.224647e-16 False False
2019-04-07 -3.090170e-01 False False
2019-04-08 -5.877853e-01 False False
2019-04-09 -8.090170e-01 False False
2019-04-10 -9.510565e-01 True True
2019-04-11 -1.000000e+00 True False
2019-04-12 -9.510565e-01 True False
2019-04-13 -8.090170e-01 False False
2019-04-14 -5.877853e-01 False False
2019-04-15 -3.090170e-01 False False

edited Mar 27 at 12:33

answered Mar 27 at 12:28

jezrael

356k26320396

Use Series.shift chained with & for bitwise AND:

df['hit_first'] = df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
print (df)
 vals is_hit hit_first
dates 
2019-03-27 0.000000e+00 False False
2019-03-28 3.090170e-01 False False
2019-03-29 5.877853e-01 False False
2019-03-30 8.090170e-01 False False
2019-03-31 9.510565e-01 True True
2019-04-01 1.000000e+00 True False
2019-04-02 9.510565e-01 True False
2019-04-03 8.090170e-01 False False
2019-04-04 5.877853e-01 False False
2019-04-05 3.090170e-01 False False
2019-04-06 1.224647e-16 False False
2019-04-07 -3.090170e-01 False False
2019-04-08 -5.877853e-01 False False
2019-04-09 -8.090170e-01 False False
2019-04-10 -9.510565e-01 True True
2019-04-11 -1.000000e+00 True False
2019-04-12 -9.510565e-01 True False
2019-04-13 -8.090170e-01 False False
2019-04-14 -5.877853e-01 False False
2019-04-15 -3.090170e-01 False False

edited Mar 27 at 12:33

answered Mar 27 at 12:28

jezrael

356k26320396

edited Mar 27 at 12:33

answered Mar 27 at 12:28

jezrael

356k26320396

answered Mar 27 at 12:28

jezrael

356k26320396

answered Mar 27 at 12:28

jezrael

356k26320396

add a comment |

I also, think you can do it this way:

df['is_hit'].astype(int).diff() == 1

Output:

dates
2019-03-27 False
2019-03-28 False
2019-03-29 False
2019-03-30 False
2019-03-31 True
2019-04-01 False
2019-04-02 False
2019-04-03 False
2019-04-04 False
2019-04-05 False
2019-04-06 False
2019-04-07 False
2019-04-08 False
2019-04-09 False
2019-04-10 True
2019-04-11 False
2019-04-12 False
2019-04-13 False
2019-04-14 False
2019-04-15 False
Name: is_hit, dtype: bool

Timings:

%timeit df['is_hit'] & (~df['is_hit']).shift(1)
1.13 ms ± 5.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
908 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].astype(int).diff() == 1
689 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited Mar 27 at 12:57

answered Mar 27 at 12:50

Scott Boston

58k73258

2

Nice, maybe performance in large data should be interesting.

– jezrael
Mar 27 at 13:11

add a comment |

I also, think you can do it this way:

df['is_hit'].astype(int).diff() == 1

Output:

dates
2019-03-27 False
2019-03-28 False
2019-03-29 False
2019-03-30 False
2019-03-31 True
2019-04-01 False
2019-04-02 False
2019-04-03 False
2019-04-04 False
2019-04-05 False
2019-04-06 False
2019-04-07 False
2019-04-08 False
2019-04-09 False
2019-04-10 True
2019-04-11 False
2019-04-12 False
2019-04-13 False
2019-04-14 False
2019-04-15 False
Name: is_hit, dtype: bool

Timings:

%timeit df['is_hit'] & (~df['is_hit']).shift(1)
1.13 ms ± 5.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
908 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].astype(int).diff() == 1
689 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited Mar 27 at 12:57

answered Mar 27 at 12:50

Scott Boston

58k73258

2

Nice, maybe performance in large data should be interesting.

– jezrael
Mar 27 at 13:11

add a comment |

I also, think you can do it this way:

df['is_hit'].astype(int).diff() == 1

Output:

dates
2019-03-27 False
2019-03-28 False
2019-03-29 False
2019-03-30 False
2019-03-31 True
2019-04-01 False
2019-04-02 False
2019-04-03 False
2019-04-04 False
2019-04-05 False
2019-04-06 False
2019-04-07 False
2019-04-08 False
2019-04-09 False
2019-04-10 True
2019-04-11 False
2019-04-12 False
2019-04-13 False
2019-04-14 False
2019-04-15 False
Name: is_hit, dtype: bool

Timings:

%timeit df['is_hit'] & (~df['is_hit']).shift(1)
1.13 ms ± 5.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
908 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].astype(int).diff() == 1
689 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited Mar 27 at 12:57

answered Mar 27 at 12:50

Scott Boston

58k73258

I also, think you can do it this way:

df['is_hit'].astype(int).diff() == 1

Output:

dates
2019-03-27 False
2019-03-28 False
2019-03-29 False
2019-03-30 False
2019-03-31 True
2019-04-01 False
2019-04-02 False
2019-04-03 False
2019-04-04 False
2019-04-05 False
2019-04-06 False
2019-04-07 False
2019-04-08 False
2019-04-09 False
2019-04-10 True
2019-04-11 False
2019-04-12 False
2019-04-13 False
2019-04-14 False
2019-04-15 False
Name: is_hit, dtype: bool

Timings:

%timeit df['is_hit'] & (~df['is_hit']).shift(1)
1.13 ms ± 5.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].ne(df['is_hit'].shift()) & df['is_hit']
908 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['is_hit'].astype(int).diff() == 1
689 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited Mar 27 at 12:57

answered Mar 27 at 12:50

Scott Boston

58k73258

edited Mar 27 at 12:57

answered Mar 27 at 12:50

Scott Boston

58k73258

answered Mar 27 at 12:50

Scott Boston

58k73258

answered Mar 27 at 12:50

Scott Boston

58k73258

2

Nice, maybe performance in large data should be interesting.

– jezrael
Mar 27 at 13:11

add a comment |

2

Nice, maybe performance in large data should be interesting.

– jezrael
Mar 27 at 13:11

Nice, maybe performance in large data should be interesting.

– jezrael
Mar 27 at 13:11

add a comment |

-1

Also this can be done by using simple difference between the series and it's shifted series by 1 period :

df['hit_first'] = df['is_hit']-df['is_hit'].shift()==1

edited Mar 27 at 22:45

answered Mar 27 at 12:54

Loochie

959310

1

The use of np.where here is quite pointless.

– miradulo
Mar 27 at 18:50

Yes I understood. Thanks :)

– Loochie
Mar 27 at 20:42

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

– DebanjanB
Mar 27 at 21:22

add a comment |

-1

Also this can be done by using simple difference between the series and it's shifted series by 1 period :

df['hit_first'] = df['is_hit']-df['is_hit'].shift()==1

edited Mar 27 at 22:45

answered Mar 27 at 12:54

Loochie

959310

1

The use of np.where here is quite pointless.

– miradulo
Mar 27 at 18:50

Yes I understood. Thanks :)

– Loochie
Mar 27 at 20:42

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

– DebanjanB
Mar 27 at 21:22

add a comment |

-1

Also this can be done by using simple difference between the series and it's shifted series by 1 period :

df['hit_first'] = df['is_hit']-df['is_hit'].shift()==1

edited Mar 27 at 22:45

answered Mar 27 at 12:54

Loochie

959310

Also this can be done by using simple difference between the series and it's shifted series by 1 period :

df['hit_first'] = df['is_hit']-df['is_hit'].shift()==1

edited Mar 27 at 22:45

answered Mar 27 at 12:54

Loochie

959310

edited Mar 27 at 22:45

answered Mar 27 at 12:54

Loochie

959310

answered Mar 27 at 12:54

Loochie

959310

answered Mar 27 at 12:54

Loochie

959310

1

The use of np.where here is quite pointless.

– miradulo
Mar 27 at 18:50

Yes I understood. Thanks :)

– Loochie
Mar 27 at 20:42

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

– DebanjanB
Mar 27 at 21:22

add a comment |

1

The use of np.where here is quite pointless.

– miradulo
Mar 27 at 18:50

Yes I understood. Thanks :)

– Loochie
Mar 27 at 20:42

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

– DebanjanB
Mar 27 at 21:22

The use of np.where here is quite pointless.

– miradulo
Mar 27 at 18:50

Yes I understood. Thanks :)

– Loochie
Mar 27 at 20:42

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

– DebanjanB
Mar 27 at 21:22

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

-dataframe, pandas, python

搜尋此網誌

Ttyjfyk

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

4 Answers
4

4 Answers
4

4 Answers
4