Groupby class and count missing values in features

I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial.

What do I want to do?

I have a dataframe like this

CLASS FEATURE1 FEATURE2 FEATURE3

  X      A       NaN      NaN

  X     NaN       A       NaN

  B      A        A        A

I want to group by the label(CLASS) and display the number of NaN-Values that are counted in every feature so that it looks like this. The purpose of this is to get a general idea how missing values are distributed over the different classes.

CLASS FEATURE1 FEATURE2 FEATURE3

  X      1        1        2

  B      0        0        0

I know how to recieve the amount of nonnull-Values - df.groupby['CLASS'].count()

Is there something similar for the NaN-Values?

I tried to subtract the count() from the size() but it returned an unformatted output filled with the value NaN

edited Dec 27 '18 at 16:11

coldspeed

131k23135221

asked Dec 27 '18 at 15:15

FelTry2

1057

add a comment |

I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial.

What do I want to do?

I have a dataframe like this

CLASS FEATURE1 FEATURE2 FEATURE3

  X      A       NaN      NaN

  X     NaN       A       NaN

  B      A        A        A

CLASS FEATURE1 FEATURE2 FEATURE3

  X      1        1        2

  B      0        0        0

I know how to recieve the amount of nonnull-Values - df.groupby['CLASS'].count()

Is there something similar for the NaN-Values?

I tried to subtract the count() from the size() but it returned an unformatted output filled with the value NaN

edited Dec 27 '18 at 16:11

coldspeed

131k23135221

asked Dec 27 '18 at 15:15

FelTry2

1057

add a comment |

I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial.

What do I want to do?

I have a dataframe like this

CLASS FEATURE1 FEATURE2 FEATURE3

  X      A       NaN      NaN

  X     NaN       A       NaN

  B      A        A        A

CLASS FEATURE1 FEATURE2 FEATURE3

  X      1        1        2

  B      0        0        0

I know how to recieve the amount of nonnull-Values - df.groupby['CLASS'].count()

Is there something similar for the NaN-Values?

I tried to subtract the count() from the size() but it returned an unformatted output filled with the value NaN

edited Dec 27 '18 at 16:11

coldspeed

131k23135221

asked Dec 27 '18 at 15:15

FelTry2

1057

I have a problem and I cannot find any solution in the web or documentation, even if I think that it is very trivial.

What do I want to do?

I have a dataframe like this

CLASS FEATURE1 FEATURE2 FEATURE3

  X      A       NaN      NaN

  X     NaN       A       NaN

  B      A        A        A

CLASS FEATURE1 FEATURE2 FEATURE3

  X      1        1        2

  B      0        0        0

I know how to recieve the amount of nonnull-Values - df.groupby['CLASS'].count()

Is there something similar for the NaN-Values?

I tried to subtract the count() from the size() but it returned an unformatted output filled with the value NaN

python pandas dataframe group-by nan

edited Dec 27 '18 at 16:11

coldspeed

131k23135221

asked Dec 27 '18 at 15:15

FelTry2

1057

edited Dec 27 '18 at 16:11

coldspeed

131k23135221

asked Dec 27 '18 at 15:15

FelTry2

1057

edited Dec 27 '18 at 16:11

coldspeed

131k23135221

edited Dec 27 '18 at 16:11

coldspeed

131k23135221

edited Dec 27 '18 at 16:11

coldspeed

131k23135221

asked Dec 27 '18 at 15:15

FelTry2

1057

asked Dec 27 '18 at 15:15

FelTry2

1057

asked Dec 27 '18 at 15:15

FelTry2

1057

add a comment |

3 Answers
3

active

oldest

votes

Compute a mask with isna, then group and find the sum:

df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()



  CLASS  FEATURE1  FEATURE2  FEATURE3

0     X       1.0       1.0       2.0

1     B       0.0       0.0       0.0

Another option is to subtract the size from the count using rsub along the 0^th axis for index aligned subtraction:

df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)

Or,

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

There are quite a few good answers, so here are some timeits for your perusal:

df_ = df

df = pd.concat([df_] * 10000)



%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()

%timeit df.set_index('CLASS').isna().sum(level=0)    

%%timeit

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)



11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Actual performance depends on your data and setup, so your mileage may vary.

edited Dec 27 '18 at 15:29

answered Dec 27 '18 at 15:16

coldspeed

131k23135221

2

Amazing - thank you very, very much!

– FelTry2
Dec 27 '18 at 15:37

add a comment |

You can use set_index and sum:

df.set_index('CLASS').isna().sum(level=0)

Output:

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

X           1.0       1.0       2.0

B           0.0       0.0       0.0

answered Dec 27 '18 at 15:18

Scott Boston

54.8k73056

add a comment |

Using the diff between count and size

g=df.groupby('CLASS')



-g.count().sub(g.size(),0)



          FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

And we can transform this question to the more generic question how to count how many NaN in dataframe with for loop

pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)

Out[468]: 

   FEATURE1  FEATURE2  FEATURE3

B         0         0         0

X         1         1         2

edited Dec 27 '18 at 16:25

answered Dec 27 '18 at 15:19

Wen-Ben

111k83266

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53947196%2fgroupby-class-and-count-missing-values-in-features%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Compute a mask with isna, then group and find the sum:

df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()



  CLASS  FEATURE1  FEATURE2  FEATURE3

0     X       1.0       1.0       2.0

1     B       0.0       0.0       0.0

Another option is to subtract the size from the count using rsub along the 0^th axis for index aligned subtraction:

df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)

Or,

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

There are quite a few good answers, so here are some timeits for your perusal:

df_ = df

df = pd.concat([df_] * 10000)



%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()

%timeit df.set_index('CLASS').isna().sum(level=0)    

%%timeit

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)



11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Actual performance depends on your data and setup, so your mileage may vary.

edited Dec 27 '18 at 15:29

answered Dec 27 '18 at 15:16

coldspeed

131k23135221

2

Amazing - thank you very, very much!

– FelTry2
Dec 27 '18 at 15:37

add a comment |

Compute a mask with isna, then group and find the sum:

df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()



  CLASS  FEATURE1  FEATURE2  FEATURE3

0     X       1.0       1.0       2.0

1     B       0.0       0.0       0.0

Another option is to subtract the size from the count using rsub along the 0^th axis for index aligned subtraction:

df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)

Or,

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

There are quite a few good answers, so here are some timeits for your perusal:

df_ = df

df = pd.concat([df_] * 10000)



%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()

%timeit df.set_index('CLASS').isna().sum(level=0)    

%%timeit

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)



11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Actual performance depends on your data and setup, so your mileage may vary.

edited Dec 27 '18 at 15:29

answered Dec 27 '18 at 15:16

coldspeed

131k23135221

2

Amazing - thank you very, very much!

– FelTry2
Dec 27 '18 at 15:37

add a comment |

Compute a mask with isna, then group and find the sum:

df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()



  CLASS  FEATURE1  FEATURE2  FEATURE3

0     X       1.0       1.0       2.0

1     B       0.0       0.0       0.0

Another option is to subtract the size from the count using rsub along the 0^th axis for index aligned subtraction:

df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)

Or,

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

There are quite a few good answers, so here are some timeits for your perusal:

df_ = df

df = pd.concat([df_] * 10000)



%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()

%timeit df.set_index('CLASS').isna().sum(level=0)    

%%timeit

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)



11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Actual performance depends on your data and setup, so your mileage may vary.

edited Dec 27 '18 at 15:29

answered Dec 27 '18 at 15:16

coldspeed

131k23135221

Compute a mask with isna, then group and find the sum:

df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum().reset_index()



  CLASS  FEATURE1  FEATURE2  FEATURE3

0     X       1.0       1.0       2.0

1     B       0.0       0.0       0.0

Another option is to subtract the size from the count using rsub along the 0^th axis for index aligned subtraction:

df.groupby('CLASS').count().rsub(df.groupby('CLASS').size(), axis=0)

Or,

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

There are quite a few good answers, so here are some timeits for your perusal:

df_ = df

df = pd.concat([df_] * 10000)



%timeit df.drop('CLASS', 1).isna().groupby(df.CLASS, sort=False).sum()

%timeit df.set_index('CLASS').isna().sum(level=0)    

%%timeit

g = df.groupby('CLASS')

g.count().rsub(g.size(), axis=0)



11.8 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

9.47 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

6.54 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Actual performance depends on your data and setup, so your mileage may vary.

edited Dec 27 '18 at 15:29

answered Dec 27 '18 at 15:16

coldspeed

131k23135221

edited Dec 27 '18 at 15:29

answered Dec 27 '18 at 15:16

coldspeed

131k23135221

answered Dec 27 '18 at 15:16

coldspeed

131k23135221

answered Dec 27 '18 at 15:16

coldspeed

131k23135221

2

Amazing - thank you very, very much!

– FelTry2
Dec 27 '18 at 15:37

add a comment |

2

Amazing - thank you very, very much!

– FelTry2
Dec 27 '18 at 15:37

Amazing - thank you very, very much!

– FelTry2
Dec 27 '18 at 15:37

add a comment |

You can use set_index and sum:

df.set_index('CLASS').isna().sum(level=0)

Output:

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

X           1.0       1.0       2.0

B           0.0       0.0       0.0

answered Dec 27 '18 at 15:18

Scott Boston

54.8k73056

add a comment |

You can use set_index and sum:

df.set_index('CLASS').isna().sum(level=0)

Output:

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

X           1.0       1.0       2.0

B           0.0       0.0       0.0

answered Dec 27 '18 at 15:18

Scott Boston

54.8k73056

add a comment |

You can use set_index and sum:

df.set_index('CLASS').isna().sum(level=0)

Output:

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

X           1.0       1.0       2.0

B           0.0       0.0       0.0

answered Dec 27 '18 at 15:18

Scott Boston

54.8k73056

You can use set_index and sum:

df.set_index('CLASS').isna().sum(level=0)

Output:

       FEATURE1  FEATURE2  FEATURE3

CLASS                              

X           1.0       1.0       2.0

B           0.0       0.0       0.0

answered Dec 27 '18 at 15:18

Scott Boston

54.8k73056

answered Dec 27 '18 at 15:18

Scott Boston

54.8k73056

answered Dec 27 '18 at 15:18

Scott Boston

54.8k73056

answered Dec 27 '18 at 15:18

Scott Boston

54.8k73056

add a comment |

Using the diff between count and size

g=df.groupby('CLASS')



-g.count().sub(g.size(),0)



          FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

And we can transform this question to the more generic question how to count how many NaN in dataframe with for loop

pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)

Out[468]: 

   FEATURE1  FEATURE2  FEATURE3

B         0         0         0

X         1         1         2

edited Dec 27 '18 at 16:25

answered Dec 27 '18 at 15:19

Wen-Ben

111k83266

add a comment |

Using the diff between count and size

g=df.groupby('CLASS')



-g.count().sub(g.size(),0)



          FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

And we can transform this question to the more generic question how to count how many NaN in dataframe with for loop

pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)

Out[468]: 

   FEATURE1  FEATURE2  FEATURE3

B         0         0         0

X         1         1         2

edited Dec 27 '18 at 16:25

answered Dec 27 '18 at 15:19

Wen-Ben

111k83266

add a comment |

Using the diff between count and size

g=df.groupby('CLASS')



-g.count().sub(g.size(),0)



          FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

And we can transform this question to the more generic question how to count how many NaN in dataframe with for loop

pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)

Out[468]: 

   FEATURE1  FEATURE2  FEATURE3

B         0         0         0

X         1         1         2

edited Dec 27 '18 at 16:25

answered Dec 27 '18 at 15:19

Wen-Ben

111k83266

Using the diff between count and size

g=df.groupby('CLASS')



-g.count().sub(g.size(),0)



          FEATURE1  FEATURE2  FEATURE3

CLASS                              

B             0         0         0

X             1         1         2

And we can transform this question to the more generic question how to count how many NaN in dataframe with for loop

pd.DataFrame({x: y.isna().sum()for x , y in g }).T.drop('CLASS',1)

Out[468]: 

   FEATURE1  FEATURE2  FEATURE3

B         0         0         0

X         1         1         2

edited Dec 27 '18 at 16:25

answered Dec 27 '18 at 15:19

Wen-Ben

111k83266

edited Dec 27 '18 at 16:25

answered Dec 27 '18 at 15:19

Wen-Ben

111k83266

answered Dec 27 '18 at 15:19

Wen-Ben

111k83266

answered Dec 27 '18 at 15:19

Wen-Ben

111k83266

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xrfgtjtk