Groupby class and count missing values in features
I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial.
What do I want to do?
I have a dataframe like this
CLASS FEATURE1 FEATURE2 FEATURE3
X A NaN NaN
X NaN A NaN
B A A A
I want to group by the label(CLASS) and display the number of NaN-Values that are counted in every feature so that it looks like this. The purpose of this is to get a general idea how missing values are distributed over the different classes.
CLASS FEATURE1 FEATURE2 FEATURE3
X 1 1 2
B 0 0 0
I know how to recieve the amount of nonnull-Values - df.groupby['CLASS'].count()
Is there something similar for the NaN-Values?
I tried to subtract the count() from the size() but it returned an unformatted output filled with the value NaN
python pandas dataframe group-by nan
add a comment |
I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial.
What do I want to do?
I have a dataframe like this
CLASS FEATURE1 FEATURE2 FEATURE3
X A NaN NaN
X NaN A NaN
B A A A
I want to group by the label(CLASS) and display the number of NaN-Values that are counted in every feature so that it looks like this. The purpose of this is to get a general idea how missing values are distributed over the different classes.
CLASS FEATURE1 FEATURE2 FEATURE3
X 1 1 2
B 0 0 0
I know how to recieve the amount of nonnull-Values - df.groupby['CLASS'].count()
Is there something similar for the NaN-Values?
I tried to subtract the count() from the size() but it returned an unformatted output filled with the value NaN
python pandas dataframe group-by nan
add a comment |
I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial.
What do I want to do?
I have a dataframe like this
CLASS FEATURE1 FEATURE2 FEATURE3
X A NaN NaN
X NaN A NaN
B A A A
I want to group by the label(CLASS) and display the number of NaN-Values that are counted in every feature so that it looks like this. The purpose of this is to get a general idea how missing values are distributed over the different classes.
CLASS FEATURE1 FEATURE2 FEATURE3
X 1 1 2
B 0 0 0
I know how to recieve the amount of nonnull-Values - df.groupby['CLASS'].count()
Is there something similar for the NaN-Values?
I tried to subtract the count() from the size() but it returned an unformatted output filled with the value NaN
python pandas dataframe group-by nan
I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial.
What do I want to do?
I have a dataframe like this
CLASS FEATURE1 FEATURE2 FEATURE3
X A NaN NaN
X NaN A NaN
B A A A
I want to group by the label(CLASS) and display the number of NaN-Values that are counted in every feature so that it looks like this. The purpose of this is to get a general idea how missing values are distributed over the different classes.
CLASS FEATURE1 FEATURE2 FEATURE3
X 1 1 2
B 0 0 0
I know how to recieve the amount of nonnull-Values - df.groupby['CLASS'].count()
Is there something similar for the NaN-Values?
I tried to subtract the count() from the size() but it returned an unformatted output filled with the value NaN
python pandas dataframe group-by nan
python pandas dataframe group-by nan
edited Dec 27 '18 at 16:11
coldspeed
131k23135221
131k23135221
asked Dec 27 '18 at 15:15
FelTry2FelTry2
1057
1057
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
Compute a mask with isna
, then group and find the sum:
df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()
CLASS FEATURE1 FEATURE2 FEATURE3
0 X 1.0 1.0 2.0
1 B 0.0 0.0 0.0
Another option is to subtract the size
from the count
using rsub
along the 0th axis for index aligned subtraction:
df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)
Or,
g = df.groupby('CLASS')
g.count().rsub(g.size(), axis=0)
FEATURE1 FEATURE2 FEATURE3
CLASS
B 0 0 0
X 1 1 2
There are quite a few good answers, so here are some timeits
for your perusal:
df_ = df
df = pd.concat([df_] * 10000)
%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()
%timeit df.set_index('CLASS').isna().sum(level=0)
%%timeit
g = df.groupby('CLASS')
g.count().rsub(g.size(), axis=0)
11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Actual performance depends on your data and setup, so your mileage may vary.
2
Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37
add a comment |
You can use set_index
and sum
:
df.set_index('CLASS').isna().sum(level=0)
Output:
FEATURE1 FEATURE2 FEATURE3
CLASS
X 1.0 1.0 2.0
B 0.0 0.0 0.0
add a comment |
Using the diff between count
and size
g=df.groupby('CLASS')
-g.count().sub(g.size(),0)
FEATURE1 FEATURE2 FEATURE3
CLASS
B 0 0 0
X 1 1 2
And we can transform this question to the more generic question how to count how many NaN
in dataframe with for loop
pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)
Out[468]:
FEATURE1 FEATURE2 FEATURE3
B 0 0 0
X 1 1 2
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53947196%2fgroupby-class-and-count-missing-values-in-features%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Compute a mask with isna
, then group and find the sum:
df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()
CLASS FEATURE1 FEATURE2 FEATURE3
0 X 1.0 1.0 2.0
1 B 0.0 0.0 0.0
Another option is to subtract the size
from the count
using rsub
along the 0th axis for index aligned subtraction:
df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)
Or,
g = df.groupby('CLASS')
g.count().rsub(g.size(), axis=0)
FEATURE1 FEATURE2 FEATURE3
CLASS
B 0 0 0
X 1 1 2
There are quite a few good answers, so here are some timeits
for your perusal:
df_ = df
df = pd.concat([df_] * 10000)
%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()
%timeit df.set_index('CLASS').isna().sum(level=0)
%%timeit
g = df.groupby('CLASS')
g.count().rsub(g.size(), axis=0)
11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Actual performance depends on your data and setup, so your mileage may vary.
2
Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37
add a comment |
Compute a mask with isna
, then group and find the sum:
df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()
CLASS FEATURE1 FEATURE2 FEATURE3
0 X 1.0 1.0 2.0
1 B 0.0 0.0 0.0
Another option is to subtract the size
from the count
using rsub
along the 0th axis for index aligned subtraction:
df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)
Or,
g = df.groupby('CLASS')
g.count().rsub(g.size(), axis=0)
FEATURE1 FEATURE2 FEATURE3
CLASS
B 0 0 0
X 1 1 2
There are quite a few good answers, so here are some timeits
for your perusal:
df_ = df
df = pd.concat([df_] * 10000)
%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()
%timeit df.set_index('CLASS').isna().sum(level=0)
%%timeit
g = df.groupby('CLASS')
g.count().rsub(g.size(), axis=0)
11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Actual performance depends on your data and setup, so your mileage may vary.
2
Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37
add a comment |
Compute a mask with isna
, then group and find the sum:
df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()
CLASS FEATURE1 FEATURE2 FEATURE3
0 X 1.0 1.0 2.0
1 B 0.0 0.0 0.0
Another option is to subtract the size
from the count
using rsub
along the 0th axis for index aligned subtraction:
df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)
Or,
g = df.groupby('CLASS')
g.count().rsub(g.size(), axis=0)
FEATURE1 FEATURE2 FEATURE3
CLASS
B 0 0 0
X 1 1 2
There are quite a few good answers, so here are some timeits
for your perusal:
df_ = df
df = pd.concat([df_] * 10000)
%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()
%timeit df.set_index('CLASS').isna().sum(level=0)
%%timeit
g = df.groupby('CLASS')
g.count().rsub(g.size(), axis=0)
11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Actual performance depends on your data and setup, so your mileage may vary.
Compute a mask with isna
, then group and find the sum:
df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()
CLASS FEATURE1 FEATURE2 FEATURE3
0 X 1.0 1.0 2.0
1 B 0.0 0.0 0.0
Another option is to subtract the size
from the count
using rsub
along the 0th axis for index aligned subtraction:
df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)
Or,
g = df.groupby('CLASS')
g.count().rsub(g.size(), axis=0)
FEATURE1 FEATURE2 FEATURE3
CLASS
B 0 0 0
X 1 1 2
There are quite a few good answers, so here are some timeits
for your perusal:
df_ = df
df = pd.concat([df_] * 10000)
%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()
%timeit df.set_index('CLASS').isna().sum(level=0)
%%timeit
g = df.groupby('CLASS')
g.count().rsub(g.size(), axis=0)
11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Actual performance depends on your data and setup, so your mileage may vary.
edited Dec 27 '18 at 15:29
answered Dec 27 '18 at 15:16
coldspeedcoldspeed
131k23135221
131k23135221
2
Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37
add a comment |
2
Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37
2
2
Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37
Amazing - thank you very, very much!
– FelTry2
Dec 27 '18 at 15:37
add a comment |
You can use set_index
and sum
:
df.set_index('CLASS').isna().sum(level=0)
Output:
FEATURE1 FEATURE2 FEATURE3
CLASS
X 1.0 1.0 2.0
B 0.0 0.0 0.0
add a comment |
You can use set_index
and sum
:
df.set_index('CLASS').isna().sum(level=0)
Output:
FEATURE1 FEATURE2 FEATURE3
CLASS
X 1.0 1.0 2.0
B 0.0 0.0 0.0
add a comment |
You can use set_index
and sum
:
df.set_index('CLASS').isna().sum(level=0)
Output:
FEATURE1 FEATURE2 FEATURE3
CLASS
X 1.0 1.0 2.0
B 0.0 0.0 0.0
You can use set_index
and sum
:
df.set_index('CLASS').isna().sum(level=0)
Output:
FEATURE1 FEATURE2 FEATURE3
CLASS
X 1.0 1.0 2.0
B 0.0 0.0 0.0
answered Dec 27 '18 at 15:18
Scott BostonScott Boston
54.8k73056
54.8k73056
add a comment |
add a comment |
Using the diff between count
and size
g=df.groupby('CLASS')
-g.count().sub(g.size(),0)
FEATURE1 FEATURE2 FEATURE3
CLASS
B 0 0 0
X 1 1 2
And we can transform this question to the more generic question how to count how many NaN
in dataframe with for loop
pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)
Out[468]:
FEATURE1 FEATURE2 FEATURE3
B 0 0 0
X 1 1 2
add a comment |
Using the diff between count
and size
g=df.groupby('CLASS')
-g.count().sub(g.size(),0)
FEATURE1 FEATURE2 FEATURE3
CLASS
B 0 0 0
X 1 1 2
And we can transform this question to the more generic question how to count how many NaN
in dataframe with for loop
pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)
Out[468]:
FEATURE1 FEATURE2 FEATURE3
B 0 0 0
X 1 1 2
add a comment |
Using the diff between count
and size
g=df.groupby('CLASS')
-g.count().sub(g.size(),0)
FEATURE1 FEATURE2 FEATURE3
CLASS
B 0 0 0
X 1 1 2
And we can transform this question to the more generic question how to count how many NaN
in dataframe with for loop
pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)
Out[468]:
FEATURE1 FEATURE2 FEATURE3
B 0 0 0
X 1 1 2
Using the diff between count
and size
g=df.groupby('CLASS')
-g.count().sub(g.size(),0)
FEATURE1 FEATURE2 FEATURE3
CLASS
B 0 0 0
X 1 1 2
And we can transform this question to the more generic question how to count how many NaN
in dataframe with for loop
pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)
Out[468]:
FEATURE1 FEATURE2 FEATURE3
B 0 0 0
X 1 1 2
edited Dec 27 '18 at 16:25
answered Dec 27 '18 at 15:19
Wen-BenWen-Ben
111k83266
111k83266
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53947196%2fgroupby-class-and-count-missing-values-in-features%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown