Use columns 1 and 2 to populate column 3
I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:
Days long_entry_flag long_exit_flag signal
1 FALSE TRUE
2 FALSE FALSE
3 TRUE FALSE 1
4 TRUE FALSE 1
5 FALSE FALSE 1
6 TRUE FALSE 1
7 TRUE FALSE 1
8 FALSE TRUE
9 FALSE TRUE
10 TRUE FALSE 1
11 TRUE FALSE 1
12 TRUE FALSE 1
13 FALSE FALSE 1
14 FALSE TRUE
15 FALSE FALSE
16 FALSE TRUE
17 TRUE FALSE 1
18 TRUE FALSE 1
19 FALSE FALSE 1
20 FALSE FALSE 1
21 FALSE TRUE
22 FALSE FALSE
23 FALSE FALSE
My pseudo-code
version would take the following steps
- Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)
- Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8
- Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)
- And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)
- etc
Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.
Many thanks in advance!
python pandas
add a comment |
I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:
Days long_entry_flag long_exit_flag signal
1 FALSE TRUE
2 FALSE FALSE
3 TRUE FALSE 1
4 TRUE FALSE 1
5 FALSE FALSE 1
6 TRUE FALSE 1
7 TRUE FALSE 1
8 FALSE TRUE
9 FALSE TRUE
10 TRUE FALSE 1
11 TRUE FALSE 1
12 TRUE FALSE 1
13 FALSE FALSE 1
14 FALSE TRUE
15 FALSE FALSE
16 FALSE TRUE
17 TRUE FALSE 1
18 TRUE FALSE 1
19 FALSE FALSE 1
20 FALSE FALSE 1
21 FALSE TRUE
22 FALSE FALSE
23 FALSE FALSE
My pseudo-code
version would take the following steps
- Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)
- Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8
- Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)
- And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)
- etc
Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.
Many thanks in advance!
python pandas
add a comment |
I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:
Days long_entry_flag long_exit_flag signal
1 FALSE TRUE
2 FALSE FALSE
3 TRUE FALSE 1
4 TRUE FALSE 1
5 FALSE FALSE 1
6 TRUE FALSE 1
7 TRUE FALSE 1
8 FALSE TRUE
9 FALSE TRUE
10 TRUE FALSE 1
11 TRUE FALSE 1
12 TRUE FALSE 1
13 FALSE FALSE 1
14 FALSE TRUE
15 FALSE FALSE
16 FALSE TRUE
17 TRUE FALSE 1
18 TRUE FALSE 1
19 FALSE FALSE 1
20 FALSE FALSE 1
21 FALSE TRUE
22 FALSE FALSE
23 FALSE FALSE
My pseudo-code
version would take the following steps
- Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)
- Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8
- Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)
- And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)
- etc
Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.
Many thanks in advance!
python pandas
I’m a Python newbie and have the following pandas dataframe - I’m trying to write code that populates the ‘signal’ column as it is below:
Days long_entry_flag long_exit_flag signal
1 FALSE TRUE
2 FALSE FALSE
3 TRUE FALSE 1
4 TRUE FALSE 1
5 FALSE FALSE 1
6 TRUE FALSE 1
7 TRUE FALSE 1
8 FALSE TRUE
9 FALSE TRUE
10 TRUE FALSE 1
11 TRUE FALSE 1
12 TRUE FALSE 1
13 FALSE FALSE 1
14 FALSE TRUE
15 FALSE FALSE
16 FALSE TRUE
17 TRUE FALSE 1
18 TRUE FALSE 1
19 FALSE FALSE 1
20 FALSE FALSE 1
21 FALSE TRUE
22 FALSE FALSE
23 FALSE FALSE
My pseudo-code
version would take the following steps
- Look down the [‘long_entry_flag’] column until entry condition is True (day 3 initially)
- Then we enter ‘1’ into [‘signal’] column every day until exit condition is True [‘long_exit_flag’]==True on day 8
- Then we look back to [‘long_entry_flag’] column to wait for the next entry condition (occurs on day 10)
- And again we enter ‘1’ into [‘signal’] column every day until exit condition is True (day 14)
- etc
Welcome ideas about ways to populate the ‘signal’ column rapidly if possible (using vectorisation?) - this is a subset of a large dataframe with tens of thousands of rows, and it is one of many dataframes being analysed in sequence.
Many thanks in advance!
python pandas
python pandas
edited Dec 12 '18 at 2:58
asked Dec 11 '18 at 10:07
Baz
817
817
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
You can do
# Assuming we're starting from the "outside"
inside = False
for ix, row in df.iterrows():
inside = (not row['long_exit_flag']
if inside
else row['long_entry_flag']
and not row['long_exit_flag']) # [True, True] case
df.at[ix, 'signal'] = 1 if inside else np.nan
which is going to give you exactly the output you posted.
Being inspired by @jezrael's answer, I created a slightly more performant version of the above while still trying to keep it as neat as I could:
# Same assumption of starting from the "outside"
df.at[0, 'signal'] = df.at[0, 'long_entry_flag']
for ix in df.index[1:]:
df.at[ix, 'signal'] = (not df.at[ix, 'long_exit_flag']
if df.at[ix - 1, 'signal']
else df.at[ix, 'long_entry_flag']
and not df.at[ix, 'long_exit_flag']) # [True, True] case
# Adjust to match the requested output exactly
df['signal'] = df['signal'].replace([True, False], [1, np.nan])
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state usingapply
(currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
Dec 11 '18 at 12:12
@ayorgo - added solution.
– jezrael
Dec 11 '18 at 12:50
@ayorgo, apologies for the late edit but your elegant solution works perfectly unless there are two FALSE days in a row at the same time when there is nothing in signal column - see additional lines on days 22-23 - in this situation your code produces '1' in the signal column on day 23 and it shouldn't - is there a simple fix that I'm missing? Or would this now be a new question?
– Baz
Dec 12 '18 at 5:17
@Baz, hmm, strange. Works on my machine. I mean both of the solutions above produceNaN
at day 23.
– ayorgo
Dec 12 '18 at 7:59
However, it works as I wouldn't expect it to when both signals areTrue
which I just fixed.
– ayorgo
Dec 12 '18 at 9:25
|
show 1 more comment
For improve performance use Numba solution:
arr = df[['long_exit_flag','long_entry_flag']].values
@jit
def f(A):
inside = False
out = np.ones(len(A), dtype=float)
for i in range(len(arr)):
inside = not A[i, 0] if inside else A[i, 1]
if not inside:
out[i] = np.nan
return out
df['signal'] = f(arr)
Performance:
#[21000 rows x 5 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [189]: %%timeit
...: inside = False
...: for ix, row in df.iterrows():
...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
...: df.at[ix, 'signal'] = 1 if inside else np.nan
...:
1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [190]: %%timeit
...: arr = df[['long_exit_flag','long_entry_flag']].values
...:
...: @jit
...: def f(A):
...: inside = False
...: out = np.ones(len(A), dtype=float)
...: for i in range(len(arr)):
...: inside = not A[i, 0] if inside else A[i, 1]
...: if not inside:
...: out[i] = np.nan
...: return out
...:
...: df['signal'] = f(arr)
...:
171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [200]: %%timeit
...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
...:
...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
...:
2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can also use numpy for shifting, also @Dark code is simplify:
In [222]: %%timeit
...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
...: shifted = np.insert(d[:-1], 0, np.nan)
...: m = (shifted==0) | (shifted==1)
...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
...:
590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
EDIT:
You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.
1
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
Dec 11 '18 at 12:57
@jezrael do check the timings against my approach : )
– Dark
Dec 11 '18 at 13:19
add a comment |
Here's an approach with complete boolean operations which is a vectorized approach and will be fast.
Step 1 :
If long_exit_flag is True return Np.nan else apply or
between long_entry_flag
and long_exit_flag
df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])
Step 2 : Now its the state where the both the columns are false
. We need to ignore it and replace the values with the previous state. Which can be done using where
and select
df['new_signal']= np.where(df['d']==0,
np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
df['d'])
Days long_entry_flag long_exit_flag signal d new_signal
0 1 False True NaN NaN NaN
1 2 False False NaN 0.0 NaN
2 3 True False 1.0 1.0 1.0
3 4 True False 1.0 1.0 1.0
4 5 False False 1.0 0.0 1.0
5 6 True False 1.0 1.0 1.0
6 7 True False 1.0 1.0 1.0
7 8 False True NaN NaN NaN
8 9 False True NaN NaN NaN
9 10 True False 1.0 1.0 1.0
10 11 True False 1.0 1.0 1.0
11 12 True False 1.0 1.0 1.0
12 13 False False 1.0 0.0 1.0
13 14 False True NaN NaN NaN
14 15 False False NaN 0.0 NaN
15 16 False True NaN NaN NaN
16 17 True False 1.0 1.0 1.0
17 18 True False 1.0 1.0 1.0
18 19 False False 1.0 0.0 1.0
19 20 False False 1.0 0.0 1.0
20 21 False True NaN NaN NaN
1
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
Dec 11 '18 at 13:46
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
Dec 11 '18 at 13:48
I know it and already upvote too. Good luck!
– jezrael
Dec 11 '18 at 13:48
add a comment |
#let the long_exit_flag equal to 0 when the exit is TRUE
df['long_exit_flag_r']=~df.long_exit_flag_r
df.temp=''
for i in range(1,len(df.index)):
df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r
if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53721733%2fuse-columns-1-and-2-to-populate-column-3%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can do
# Assuming we're starting from the "outside"
inside = False
for ix, row in df.iterrows():
inside = (not row['long_exit_flag']
if inside
else row['long_entry_flag']
and not row['long_exit_flag']) # [True, True] case
df.at[ix, 'signal'] = 1 if inside else np.nan
which is going to give you exactly the output you posted.
Being inspired by @jezrael's answer, I created a slightly more performant version of the above while still trying to keep it as neat as I could:
# Same assumption of starting from the "outside"
df.at[0, 'signal'] = df.at[0, 'long_entry_flag']
for ix in df.index[1:]:
df.at[ix, 'signal'] = (not df.at[ix, 'long_exit_flag']
if df.at[ix - 1, 'signal']
else df.at[ix, 'long_entry_flag']
and not df.at[ix, 'long_exit_flag']) # [True, True] case
# Adjust to match the requested output exactly
df['signal'] = df['signal'].replace([True, False], [1, np.nan])
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state usingapply
(currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
Dec 11 '18 at 12:12
@ayorgo - added solution.
– jezrael
Dec 11 '18 at 12:50
@ayorgo, apologies for the late edit but your elegant solution works perfectly unless there are two FALSE days in a row at the same time when there is nothing in signal column - see additional lines on days 22-23 - in this situation your code produces '1' in the signal column on day 23 and it shouldn't - is there a simple fix that I'm missing? Or would this now be a new question?
– Baz
Dec 12 '18 at 5:17
@Baz, hmm, strange. Works on my machine. I mean both of the solutions above produceNaN
at day 23.
– ayorgo
Dec 12 '18 at 7:59
However, it works as I wouldn't expect it to when both signals areTrue
which I just fixed.
– ayorgo
Dec 12 '18 at 9:25
|
show 1 more comment
You can do
# Assuming we're starting from the "outside"
inside = False
for ix, row in df.iterrows():
inside = (not row['long_exit_flag']
if inside
else row['long_entry_flag']
and not row['long_exit_flag']) # [True, True] case
df.at[ix, 'signal'] = 1 if inside else np.nan
which is going to give you exactly the output you posted.
Being inspired by @jezrael's answer, I created a slightly more performant version of the above while still trying to keep it as neat as I could:
# Same assumption of starting from the "outside"
df.at[0, 'signal'] = df.at[0, 'long_entry_flag']
for ix in df.index[1:]:
df.at[ix, 'signal'] = (not df.at[ix, 'long_exit_flag']
if df.at[ix - 1, 'signal']
else df.at[ix, 'long_entry_flag']
and not df.at[ix, 'long_exit_flag']) # [True, True] case
# Adjust to match the requested output exactly
df['signal'] = df['signal'].replace([True, False], [1, np.nan])
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state usingapply
(currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
Dec 11 '18 at 12:12
@ayorgo - added solution.
– jezrael
Dec 11 '18 at 12:50
@ayorgo, apologies for the late edit but your elegant solution works perfectly unless there are two FALSE days in a row at the same time when there is nothing in signal column - see additional lines on days 22-23 - in this situation your code produces '1' in the signal column on day 23 and it shouldn't - is there a simple fix that I'm missing? Or would this now be a new question?
– Baz
Dec 12 '18 at 5:17
@Baz, hmm, strange. Works on my machine. I mean both of the solutions above produceNaN
at day 23.
– ayorgo
Dec 12 '18 at 7:59
However, it works as I wouldn't expect it to when both signals areTrue
which I just fixed.
– ayorgo
Dec 12 '18 at 9:25
|
show 1 more comment
You can do
# Assuming we're starting from the "outside"
inside = False
for ix, row in df.iterrows():
inside = (not row['long_exit_flag']
if inside
else row['long_entry_flag']
and not row['long_exit_flag']) # [True, True] case
df.at[ix, 'signal'] = 1 if inside else np.nan
which is going to give you exactly the output you posted.
Being inspired by @jezrael's answer, I created a slightly more performant version of the above while still trying to keep it as neat as I could:
# Same assumption of starting from the "outside"
df.at[0, 'signal'] = df.at[0, 'long_entry_flag']
for ix in df.index[1:]:
df.at[ix, 'signal'] = (not df.at[ix, 'long_exit_flag']
if df.at[ix - 1, 'signal']
else df.at[ix, 'long_entry_flag']
and not df.at[ix, 'long_exit_flag']) # [True, True] case
# Adjust to match the requested output exactly
df['signal'] = df['signal'].replace([True, False], [1, np.nan])
You can do
# Assuming we're starting from the "outside"
inside = False
for ix, row in df.iterrows():
inside = (not row['long_exit_flag']
if inside
else row['long_entry_flag']
and not row['long_exit_flag']) # [True, True] case
df.at[ix, 'signal'] = 1 if inside else np.nan
which is going to give you exactly the output you posted.
Being inspired by @jezrael's answer, I created a slightly more performant version of the above while still trying to keep it as neat as I could:
# Same assumption of starting from the "outside"
df.at[0, 'signal'] = df.at[0, 'long_entry_flag']
for ix in df.index[1:]:
df.at[ix, 'signal'] = (not df.at[ix, 'long_exit_flag']
if df.at[ix - 1, 'signal']
else df.at[ix, 'long_entry_flag']
and not df.at[ix, 'long_exit_flag']) # [True, True] case
# Adjust to match the requested output exactly
df['signal'] = df['signal'].replace([True, False], [1, np.nan])
edited Dec 12 '18 at 10:48
answered Dec 11 '18 at 11:13
ayorgo
1,110514
1,110514
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state usingapply
(currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
Dec 11 '18 at 12:12
@ayorgo - added solution.
– jezrael
Dec 11 '18 at 12:50
@ayorgo, apologies for the late edit but your elegant solution works perfectly unless there are two FALSE days in a row at the same time when there is nothing in signal column - see additional lines on days 22-23 - in this situation your code produces '1' in the signal column on day 23 and it shouldn't - is there a simple fix that I'm missing? Or would this now be a new question?
– Baz
Dec 12 '18 at 5:17
@Baz, hmm, strange. Works on my machine. I mean both of the solutions above produceNaN
at day 23.
– ayorgo
Dec 12 '18 at 7:59
However, it works as I wouldn't expect it to when both signals areTrue
which I just fixed.
– ayorgo
Dec 12 '18 at 9:25
|
show 1 more comment
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state usingapply
(currying?) or do it in a vectorized way I'll be the first to upvote.
– ayorgo
Dec 11 '18 at 12:12
@ayorgo - added solution.
– jezrael
Dec 11 '18 at 12:50
@ayorgo, apologies for the late edit but your elegant solution works perfectly unless there are two FALSE days in a row at the same time when there is nothing in signal column - see additional lines on days 22-23 - in this situation your code produces '1' in the signal column on day 23 and it shouldn't - is there a simple fix that I'm missing? Or would this now be a new question?
– Baz
Dec 12 '18 at 5:17
@Baz, hmm, strange. Works on my machine. I mean both of the solutions above produceNaN
at day 23.
– ayorgo
Dec 12 '18 at 7:59
However, it works as I wouldn't expect it to when both signals areTrue
which I just fixed.
– ayorgo
Dec 12 '18 at 9:25
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using
apply
(currying?) or do it in a vectorized way I'll be the first to upvote.– ayorgo
Dec 11 '18 at 12:12
I appreciate this @jezrael but another disadvantage of my solution is that it requires a state. If you have an idea on how to make it stateless, preserve the state using
apply
(currying?) or do it in a vectorized way I'll be the first to upvote.– ayorgo
Dec 11 '18 at 12:12
@ayorgo - added solution.
– jezrael
Dec 11 '18 at 12:50
@ayorgo - added solution.
– jezrael
Dec 11 '18 at 12:50
@ayorgo, apologies for the late edit but your elegant solution works perfectly unless there are two FALSE days in a row at the same time when there is nothing in signal column - see additional lines on days 22-23 - in this situation your code produces '1' in the signal column on day 23 and it shouldn't - is there a simple fix that I'm missing? Or would this now be a new question?
– Baz
Dec 12 '18 at 5:17
@ayorgo, apologies for the late edit but your elegant solution works perfectly unless there are two FALSE days in a row at the same time when there is nothing in signal column - see additional lines on days 22-23 - in this situation your code produces '1' in the signal column on day 23 and it shouldn't - is there a simple fix that I'm missing? Or would this now be a new question?
– Baz
Dec 12 '18 at 5:17
@Baz, hmm, strange. Works on my machine. I mean both of the solutions above produce
NaN
at day 23.– ayorgo
Dec 12 '18 at 7:59
@Baz, hmm, strange. Works on my machine. I mean both of the solutions above produce
NaN
at day 23.– ayorgo
Dec 12 '18 at 7:59
However, it works as I wouldn't expect it to when both signals are
True
which I just fixed.– ayorgo
Dec 12 '18 at 9:25
However, it works as I wouldn't expect it to when both signals are
True
which I just fixed.– ayorgo
Dec 12 '18 at 9:25
|
show 1 more comment
For improve performance use Numba solution:
arr = df[['long_exit_flag','long_entry_flag']].values
@jit
def f(A):
inside = False
out = np.ones(len(A), dtype=float)
for i in range(len(arr)):
inside = not A[i, 0] if inside else A[i, 1]
if not inside:
out[i] = np.nan
return out
df['signal'] = f(arr)
Performance:
#[21000 rows x 5 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [189]: %%timeit
...: inside = False
...: for ix, row in df.iterrows():
...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
...: df.at[ix, 'signal'] = 1 if inside else np.nan
...:
1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [190]: %%timeit
...: arr = df[['long_exit_flag','long_entry_flag']].values
...:
...: @jit
...: def f(A):
...: inside = False
...: out = np.ones(len(A), dtype=float)
...: for i in range(len(arr)):
...: inside = not A[i, 0] if inside else A[i, 1]
...: if not inside:
...: out[i] = np.nan
...: return out
...:
...: df['signal'] = f(arr)
...:
171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [200]: %%timeit
...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
...:
...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
...:
2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can also use numpy for shifting, also @Dark code is simplify:
In [222]: %%timeit
...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
...: shifted = np.insert(d[:-1], 0, np.nan)
...: m = (shifted==0) | (shifted==1)
...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
...:
590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
EDIT:
You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.
1
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
Dec 11 '18 at 12:57
@jezrael do check the timings against my approach : )
– Dark
Dec 11 '18 at 13:19
add a comment |
For improve performance use Numba solution:
arr = df[['long_exit_flag','long_entry_flag']].values
@jit
def f(A):
inside = False
out = np.ones(len(A), dtype=float)
for i in range(len(arr)):
inside = not A[i, 0] if inside else A[i, 1]
if not inside:
out[i] = np.nan
return out
df['signal'] = f(arr)
Performance:
#[21000 rows x 5 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [189]: %%timeit
...: inside = False
...: for ix, row in df.iterrows():
...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
...: df.at[ix, 'signal'] = 1 if inside else np.nan
...:
1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [190]: %%timeit
...: arr = df[['long_exit_flag','long_entry_flag']].values
...:
...: @jit
...: def f(A):
...: inside = False
...: out = np.ones(len(A), dtype=float)
...: for i in range(len(arr)):
...: inside = not A[i, 0] if inside else A[i, 1]
...: if not inside:
...: out[i] = np.nan
...: return out
...:
...: df['signal'] = f(arr)
...:
171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [200]: %%timeit
...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
...:
...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
...:
2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can also use numpy for shifting, also @Dark code is simplify:
In [222]: %%timeit
...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
...: shifted = np.insert(d[:-1], 0, np.nan)
...: m = (shifted==0) | (shifted==1)
...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
...:
590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
EDIT:
You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.
1
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
Dec 11 '18 at 12:57
@jezrael do check the timings against my approach : )
– Dark
Dec 11 '18 at 13:19
add a comment |
For improve performance use Numba solution:
arr = df[['long_exit_flag','long_entry_flag']].values
@jit
def f(A):
inside = False
out = np.ones(len(A), dtype=float)
for i in range(len(arr)):
inside = not A[i, 0] if inside else A[i, 1]
if not inside:
out[i] = np.nan
return out
df['signal'] = f(arr)
Performance:
#[21000 rows x 5 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [189]: %%timeit
...: inside = False
...: for ix, row in df.iterrows():
...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
...: df.at[ix, 'signal'] = 1 if inside else np.nan
...:
1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [190]: %%timeit
...: arr = df[['long_exit_flag','long_entry_flag']].values
...:
...: @jit
...: def f(A):
...: inside = False
...: out = np.ones(len(A), dtype=float)
...: for i in range(len(arr)):
...: inside = not A[i, 0] if inside else A[i, 1]
...: if not inside:
...: out[i] = np.nan
...: return out
...:
...: df['signal'] = f(arr)
...:
171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [200]: %%timeit
...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
...:
...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
...:
2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can also use numpy for shifting, also @Dark code is simplify:
In [222]: %%timeit
...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
...: shifted = np.insert(d[:-1], 0, np.nan)
...: m = (shifted==0) | (shifted==1)
...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
...:
590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
EDIT:
You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.
For improve performance use Numba solution:
arr = df[['long_exit_flag','long_entry_flag']].values
@jit
def f(A):
inside = False
out = np.ones(len(A), dtype=float)
for i in range(len(arr)):
inside = not A[i, 0] if inside else A[i, 1]
if not inside:
out[i] = np.nan
return out
df['signal'] = f(arr)
Performance:
#[21000 rows x 5 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [189]: %%timeit
...: inside = False
...: for ix, row in df.iterrows():
...: inside = not row['long_exit_flag'] if inside else row['long_entry_flag']
...: df.at[ix, 'signal'] = 1 if inside else np.nan
...:
1.58 s ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [190]: %%timeit
...: arr = df[['long_exit_flag','long_entry_flag']].values
...:
...: @jit
...: def f(A):
...: inside = False
...: out = np.ones(len(A), dtype=float)
...: for i in range(len(arr)):
...: inside = not A[i, 0] if inside else A[i, 1]
...: if not inside:
...: out[i] = np.nan
...: return out
...:
...: df['signal'] = f(arr)
...:
171 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [200]: %%timeit
...: df['d'] = np.where(~df['long_exit_flag'],df['long_entry_flag'] | df['long_exit_flag'],np.nan)
...:
...: df['new_select']= np.where(df['d']==0, np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan), df['d'])
...:
2.4 ms ± 561 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can also use numpy for shifting, also @Dark code is simplify:
In [222]: %%timeit
...: d = np.where(~df['long_exit_flag'].values, df['long_entry_flag'].values | df['long_exit_flag'].values, np.nan)
...: shifted = np.insert(d[:-1], 0, np.nan)
...: m = (shifted==0) | (shifted==1)
...: df['signal1']= np.select([d!=0, m], [d, 1], np.nan)
...:
590 µs ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
EDIT:
You can also check Does iterrows have performance issues? for general order of precedence for performance of various operations in pandas.
edited Dec 11 '18 at 13:45
answered Dec 11 '18 at 12:48
jezrael
321k22263341
321k22263341
1
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
Dec 11 '18 at 12:57
@jezrael do check the timings against my approach : )
– Dark
Dec 11 '18 at 13:19
add a comment |
1
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
Dec 11 '18 at 12:57
@jezrael do check the timings against my approach : )
– Dark
Dec 11 '18 at 13:19
1
1
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
Dec 11 '18 at 12:57
Ah, alright. I forgot one can simply iterate over index. Always looking for the neatest thing. Thanks.
– ayorgo
Dec 11 '18 at 12:57
@jezrael do check the timings against my approach : )
– Dark
Dec 11 '18 at 13:19
@jezrael do check the timings against my approach : )
– Dark
Dec 11 '18 at 13:19
add a comment |
Here's an approach with complete boolean operations which is a vectorized approach and will be fast.
Step 1 :
If long_exit_flag is True return Np.nan else apply or
between long_entry_flag
and long_exit_flag
df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])
Step 2 : Now its the state where the both the columns are false
. We need to ignore it and replace the values with the previous state. Which can be done using where
and select
df['new_signal']= np.where(df['d']==0,
np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
df['d'])
Days long_entry_flag long_exit_flag signal d new_signal
0 1 False True NaN NaN NaN
1 2 False False NaN 0.0 NaN
2 3 True False 1.0 1.0 1.0
3 4 True False 1.0 1.0 1.0
4 5 False False 1.0 0.0 1.0
5 6 True False 1.0 1.0 1.0
6 7 True False 1.0 1.0 1.0
7 8 False True NaN NaN NaN
8 9 False True NaN NaN NaN
9 10 True False 1.0 1.0 1.0
10 11 True False 1.0 1.0 1.0
11 12 True False 1.0 1.0 1.0
12 13 False False 1.0 0.0 1.0
13 14 False True NaN NaN NaN
14 15 False False NaN 0.0 NaN
15 16 False True NaN NaN NaN
16 17 True False 1.0 1.0 1.0
17 18 True False 1.0 1.0 1.0
18 19 False False 1.0 0.0 1.0
19 20 False False 1.0 0.0 1.0
20 21 False True NaN NaN NaN
1
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
Dec 11 '18 at 13:46
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
Dec 11 '18 at 13:48
I know it and already upvote too. Good luck!
– jezrael
Dec 11 '18 at 13:48
add a comment |
Here's an approach with complete boolean operations which is a vectorized approach and will be fast.
Step 1 :
If long_exit_flag is True return Np.nan else apply or
between long_entry_flag
and long_exit_flag
df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])
Step 2 : Now its the state where the both the columns are false
. We need to ignore it and replace the values with the previous state. Which can be done using where
and select
df['new_signal']= np.where(df['d']==0,
np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
df['d'])
Days long_entry_flag long_exit_flag signal d new_signal
0 1 False True NaN NaN NaN
1 2 False False NaN 0.0 NaN
2 3 True False 1.0 1.0 1.0
3 4 True False 1.0 1.0 1.0
4 5 False False 1.0 0.0 1.0
5 6 True False 1.0 1.0 1.0
6 7 True False 1.0 1.0 1.0
7 8 False True NaN NaN NaN
8 9 False True NaN NaN NaN
9 10 True False 1.0 1.0 1.0
10 11 True False 1.0 1.0 1.0
11 12 True False 1.0 1.0 1.0
12 13 False False 1.0 0.0 1.0
13 14 False True NaN NaN NaN
14 15 False False NaN 0.0 NaN
15 16 False True NaN NaN NaN
16 17 True False 1.0 1.0 1.0
17 18 True False 1.0 1.0 1.0
18 19 False False 1.0 0.0 1.0
19 20 False False 1.0 0.0 1.0
20 21 False True NaN NaN NaN
1
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
Dec 11 '18 at 13:46
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
Dec 11 '18 at 13:48
I know it and already upvote too. Good luck!
– jezrael
Dec 11 '18 at 13:48
add a comment |
Here's an approach with complete boolean operations which is a vectorized approach and will be fast.
Step 1 :
If long_exit_flag is True return Np.nan else apply or
between long_entry_flag
and long_exit_flag
df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])
Step 2 : Now its the state where the both the columns are false
. We need to ignore it and replace the values with the previous state. Which can be done using where
and select
df['new_signal']= np.where(df['d']==0,
np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
df['d'])
Days long_entry_flag long_exit_flag signal d new_signal
0 1 False True NaN NaN NaN
1 2 False False NaN 0.0 NaN
2 3 True False 1.0 1.0 1.0
3 4 True False 1.0 1.0 1.0
4 5 False False 1.0 0.0 1.0
5 6 True False 1.0 1.0 1.0
6 7 True False 1.0 1.0 1.0
7 8 False True NaN NaN NaN
8 9 False True NaN NaN NaN
9 10 True False 1.0 1.0 1.0
10 11 True False 1.0 1.0 1.0
11 12 True False 1.0 1.0 1.0
12 13 False False 1.0 0.0 1.0
13 14 False True NaN NaN NaN
14 15 False False NaN 0.0 NaN
15 16 False True NaN NaN NaN
16 17 True False 1.0 1.0 1.0
17 18 True False 1.0 1.0 1.0
18 19 False False 1.0 0.0 1.0
19 20 False False 1.0 0.0 1.0
20 21 False True NaN NaN NaN
Here's an approach with complete boolean operations which is a vectorized approach and will be fast.
Step 1 :
If long_exit_flag is True return Np.nan else apply or
between long_entry_flag
and long_exit_flag
df['d'] = np.where(df['long_exit_flag'], np.nan, df['long_entry_flag'] | df['long_exit_flag'])
Step 2 : Now its the state where the both the columns are false
. We need to ignore it and replace the values with the previous state. Which can be done using where
and select
df['new_signal']= np.where(df['d']==0,
np.select([df['d'].shift()==0, df['d'].shift()==1],[1,1], np.nan),
df['d'])
Days long_entry_flag long_exit_flag signal d new_signal
0 1 False True NaN NaN NaN
1 2 False False NaN 0.0 NaN
2 3 True False 1.0 1.0 1.0
3 4 True False 1.0 1.0 1.0
4 5 False False 1.0 0.0 1.0
5 6 True False 1.0 1.0 1.0
6 7 True False 1.0 1.0 1.0
7 8 False True NaN NaN NaN
8 9 False True NaN NaN NaN
9 10 True False 1.0 1.0 1.0
10 11 True False 1.0 1.0 1.0
11 12 True False 1.0 1.0 1.0
12 13 False False 1.0 0.0 1.0
13 14 False True NaN NaN NaN
14 15 False False NaN 0.0 NaN
15 16 False True NaN NaN NaN
16 17 True False 1.0 1.0 1.0
17 18 True False 1.0 1.0 1.0
18 19 False False 1.0 0.0 1.0
19 20 False False 1.0 0.0 1.0
20 21 False True NaN NaN NaN
edited Dec 11 '18 at 13:32
answered Dec 11 '18 at 13:18
Dark
21.2k31946
21.2k31946
1
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
Dec 11 '18 at 13:46
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
Dec 11 '18 at 13:48
I know it and already upvote too. Good luck!
– jezrael
Dec 11 '18 at 13:48
add a comment |
1
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
Dec 11 '18 at 13:46
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
Dec 11 '18 at 13:48
I know it and already upvote too. Good luck!
– jezrael
Dec 11 '18 at 13:48
1
1
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
Dec 11 '18 at 13:46
Nice solution, I try numpy fy it - check edited my answer with new timings.
– jezrael
Dec 11 '18 at 13:46
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
Dec 11 '18 at 13:48
I already upvoted sir. There might be a case which this solution might still not cover. Still curious.
– Dark
Dec 11 '18 at 13:48
I know it and already upvote too. Good luck!
– jezrael
Dec 11 '18 at 13:48
I know it and already upvote too. Good luck!
– jezrael
Dec 11 '18 at 13:48
add a comment |
#let the long_exit_flag equal to 0 when the exit is TRUE
df['long_exit_flag_r']=~df.long_exit_flag_r
df.temp=''
for i in range(1,len(df.index)):
df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r
if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)
add a comment |
#let the long_exit_flag equal to 0 when the exit is TRUE
df['long_exit_flag_r']=~df.long_exit_flag_r
df.temp=''
for i in range(1,len(df.index)):
df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r
if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)
add a comment |
#let the long_exit_flag equal to 0 when the exit is TRUE
df['long_exit_flag_r']=~df.long_exit_flag_r
df.temp=''
for i in range(1,len(df.index)):
df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r
if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)
#let the long_exit_flag equal to 0 when the exit is TRUE
df['long_exit_flag_r']=~df.long_exit_flag_r
df.temp=''
for i in range(1,len(df.index)):
df.temp[i]=(df.signal[i-1]+df.long_entry_flag[i])*df.long_exit_flag_r
if the temp is positive then the signal should be 1, if the temp is negative then the signal should be empty. (I kinda get stuck here)
edited Dec 11 '18 at 10:42
answered Dec 11 '18 at 10:36
ZhouXing98
445
445
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53721733%2fuse-columns-1-and-2-to-populate-column-3%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown