How do you estimate the mean (average) of a histogram? [closed]
$begingroup$
I have some trouble finding tutorials of this topic. I understand that estimating the mean from a histogram is only an estimate, however, is there some sort of formula or process to acquire the mean?
statistics
$endgroup$
closed as off-topic by JMoravitz, Xander Henderson, heropup, Clarinetist, Leucippus Jan 14 at 4:58
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "This question is missing context or other details: Please provide additional context, which ideally explains why the question is relevant to you and our community. Some forms of context include: background and motivation, relevant definitions, source, possible strategies, your current progress, why the question is interesting or important, etc." – JMoravitz, Xander Henderson, heropup, Clarinetist, Leucippus
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
$begingroup$
I have some trouble finding tutorials of this topic. I understand that estimating the mean from a histogram is only an estimate, however, is there some sort of formula or process to acquire the mean?
statistics
$endgroup$
closed as off-topic by JMoravitz, Xander Henderson, heropup, Clarinetist, Leucippus Jan 14 at 4:58
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "This question is missing context or other details: Please provide additional context, which ideally explains why the question is relevant to you and our community. Some forms of context include: background and motivation, relevant definitions, source, possible strategies, your current progress, why the question is interesting or important, etc." – JMoravitz, Xander Henderson, heropup, Clarinetist, Leucippus
If this question can be reworded to fit the rules in the help center, please edit the question.
$begingroup$
The same way you estimate them if the values were displayed normally rather than as a histogram. For mean and median you just get a feel for it and try to guesstimate a number "somewhere in the middle." For mode, that is simply the most frequently occurring number. Be aware though that unless the values are labeled in a histogram, mode can be quite tricky. Imagine a histogram with the values $4,5,6,100,200,200$. It might look like there are three fives at a glance rather than the three small numbers all being different which would make your guess at a mode incorrect.
$endgroup$
– JMoravitz
Jan 13 at 23:51
$begingroup$
cs.uni.edu/~campbell/stat/histrev2.html
$endgroup$
– D.B.
Jan 14 at 0:14
add a comment |
$begingroup$
I have some trouble finding tutorials of this topic. I understand that estimating the mean from a histogram is only an estimate, however, is there some sort of formula or process to acquire the mean?
statistics
$endgroup$
I have some trouble finding tutorials of this topic. I understand that estimating the mean from a histogram is only an estimate, however, is there some sort of formula or process to acquire the mean?
statistics
statistics
edited Jan 14 at 0:23
9766Joe
asked Jan 13 at 23:43
9766Joe9766Joe
84
84
closed as off-topic by JMoravitz, Xander Henderson, heropup, Clarinetist, Leucippus Jan 14 at 4:58
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "This question is missing context or other details: Please provide additional context, which ideally explains why the question is relevant to you and our community. Some forms of context include: background and motivation, relevant definitions, source, possible strategies, your current progress, why the question is interesting or important, etc." – JMoravitz, Xander Henderson, heropup, Clarinetist, Leucippus
If this question can be reworded to fit the rules in the help center, please edit the question.
closed as off-topic by JMoravitz, Xander Henderson, heropup, Clarinetist, Leucippus Jan 14 at 4:58
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "This question is missing context or other details: Please provide additional context, which ideally explains why the question is relevant to you and our community. Some forms of context include: background and motivation, relevant definitions, source, possible strategies, your current progress, why the question is interesting or important, etc." – JMoravitz, Xander Henderson, heropup, Clarinetist, Leucippus
If this question can be reworded to fit the rules in the help center, please edit the question.
$begingroup$
The same way you estimate them if the values were displayed normally rather than as a histogram. For mean and median you just get a feel for it and try to guesstimate a number "somewhere in the middle." For mode, that is simply the most frequently occurring number. Be aware though that unless the values are labeled in a histogram, mode can be quite tricky. Imagine a histogram with the values $4,5,6,100,200,200$. It might look like there are three fives at a glance rather than the three small numbers all being different which would make your guess at a mode incorrect.
$endgroup$
– JMoravitz
Jan 13 at 23:51
$begingroup$
cs.uni.edu/~campbell/stat/histrev2.html
$endgroup$
– D.B.
Jan 14 at 0:14
add a comment |
$begingroup$
The same way you estimate them if the values were displayed normally rather than as a histogram. For mean and median you just get a feel for it and try to guesstimate a number "somewhere in the middle." For mode, that is simply the most frequently occurring number. Be aware though that unless the values are labeled in a histogram, mode can be quite tricky. Imagine a histogram with the values $4,5,6,100,200,200$. It might look like there are three fives at a glance rather than the three small numbers all being different which would make your guess at a mode incorrect.
$endgroup$
– JMoravitz
Jan 13 at 23:51
$begingroup$
cs.uni.edu/~campbell/stat/histrev2.html
$endgroup$
– D.B.
Jan 14 at 0:14
$begingroup$
The same way you estimate them if the values were displayed normally rather than as a histogram. For mean and median you just get a feel for it and try to guesstimate a number "somewhere in the middle." For mode, that is simply the most frequently occurring number. Be aware though that unless the values are labeled in a histogram, mode can be quite tricky. Imagine a histogram with the values $4,5,6,100,200,200$. It might look like there are three fives at a glance rather than the three small numbers all being different which would make your guess at a mode incorrect.
$endgroup$
– JMoravitz
Jan 13 at 23:51
$begingroup$
The same way you estimate them if the values were displayed normally rather than as a histogram. For mean and median you just get a feel for it and try to guesstimate a number "somewhere in the middle." For mode, that is simply the most frequently occurring number. Be aware though that unless the values are labeled in a histogram, mode can be quite tricky. Imagine a histogram with the values $4,5,6,100,200,200$. It might look like there are three fives at a glance rather than the three small numbers all being different which would make your guess at a mode incorrect.
$endgroup$
– JMoravitz
Jan 13 at 23:51
$begingroup$
cs.uni.edu/~campbell/stat/histrev2.html
$endgroup$
– D.B.
Jan 14 at 0:14
$begingroup$
cs.uni.edu/~campbell/stat/histrev2.html
$endgroup$
– D.B.
Jan 14 at 0:14
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Taking an example from Wikipedia's histogram article, you might see something the following histogram
and you might guess a mean around $25$ (between the fifth and sixth bins) which has most of the data to the left, but counterbalanced by the more extreme values to the right
If you wanted the precise figure, you might look at the location and dimension of each bin. In this example, Wikipedia actually gives numbers, so let's use those
Bin-left Bin-width Bin-height
0 5 836
5 5 2737
10 5 3723
15 5 3926
20 5 3596
25 5 1438
30 5 3273
35 5 642
40 5 824
45 15 613
60 30 215
90 60 57
But to find the mean (the average moment or leverage about $0$) you need to know the area of each bin (width times height) and the midpoint of each bin, in order to multiply these together to give the leverage
Bin-left Bin-width Bin-height Bin-area Bin-midpoint Bin-leverage
0 5 836 4180 2.5 10450.0
5 5 2737 13687 7.5 102652.5
10 5 3723 18618 12.5 232725.0
15 5 3926 19634 17.5 343595.0
20 5 3596 17981 22.5 404572.5
25 5 1438 7190 27.5 197725.0
30 5 3273 16369 32.5 531992.5
35 5 642 3212 37.5 120450.0
40 5 824 4122 42.5 175185.0
45 15 613 9200 52.5 483000.0
60 30 215 6461 75.0 484575.0
90 60 57 3435 120.0 412200.0
Adding up the areas to give $124089$ and the leverages to give $3488122.5$ and dividing the former by the latter gives a mean of about $28.2$
This might be a slight over estimate (a) because people tend to answer survey questions with round numbers so more to the left than to the right of these bins and (b) because among the extreme values on the right smaller values may be more likely than larger values, i.e. the bins may not actually be rectangles. Even ignoring those points, this calculated mean and the original guess of $25$ are not far apart
$endgroup$
add a comment |
$begingroup$
Suppose (for example) that your histogram shows the weights of people in kilograms. The histogram has 3 columns -- one for people who weigh 50 to 60 kg, one for people who weigh 60 to 70 kg, and one for people who weigh 70 to 80 kg. Suppose the first column has height 2, the second column has height 3, and the third has height 1. In summary we have
Weight 50 to 60 -- column height = 2
Weight 60 to 70 -- column height = 3
Weight 70 to 80 -- column height = 1
If we don't have any further info, it's reasonable to assume that the two people in the 50-to-60 category both weigh 55 kg (the average of 50 and 60).
Continuing this approach, we assume that our 6 people have weights 55, 55, 65, 65, 65, 75.
I expect you know how to compute the mean of those 6 numbers.
$endgroup$
$begingroup$
Thank you for the answer!
$endgroup$
– 9766Joe
Jan 14 at 0:44
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Taking an example from Wikipedia's histogram article, you might see something the following histogram
and you might guess a mean around $25$ (between the fifth and sixth bins) which has most of the data to the left, but counterbalanced by the more extreme values to the right
If you wanted the precise figure, you might look at the location and dimension of each bin. In this example, Wikipedia actually gives numbers, so let's use those
Bin-left Bin-width Bin-height
0 5 836
5 5 2737
10 5 3723
15 5 3926
20 5 3596
25 5 1438
30 5 3273
35 5 642
40 5 824
45 15 613
60 30 215
90 60 57
But to find the mean (the average moment or leverage about $0$) you need to know the area of each bin (width times height) and the midpoint of each bin, in order to multiply these together to give the leverage
Bin-left Bin-width Bin-height Bin-area Bin-midpoint Bin-leverage
0 5 836 4180 2.5 10450.0
5 5 2737 13687 7.5 102652.5
10 5 3723 18618 12.5 232725.0
15 5 3926 19634 17.5 343595.0
20 5 3596 17981 22.5 404572.5
25 5 1438 7190 27.5 197725.0
30 5 3273 16369 32.5 531992.5
35 5 642 3212 37.5 120450.0
40 5 824 4122 42.5 175185.0
45 15 613 9200 52.5 483000.0
60 30 215 6461 75.0 484575.0
90 60 57 3435 120.0 412200.0
Adding up the areas to give $124089$ and the leverages to give $3488122.5$ and dividing the former by the latter gives a mean of about $28.2$
This might be a slight over estimate (a) because people tend to answer survey questions with round numbers so more to the left than to the right of these bins and (b) because among the extreme values on the right smaller values may be more likely than larger values, i.e. the bins may not actually be rectangles. Even ignoring those points, this calculated mean and the original guess of $25$ are not far apart
$endgroup$
add a comment |
$begingroup$
Taking an example from Wikipedia's histogram article, you might see something the following histogram
and you might guess a mean around $25$ (between the fifth and sixth bins) which has most of the data to the left, but counterbalanced by the more extreme values to the right
If you wanted the precise figure, you might look at the location and dimension of each bin. In this example, Wikipedia actually gives numbers, so let's use those
Bin-left Bin-width Bin-height
0 5 836
5 5 2737
10 5 3723
15 5 3926
20 5 3596
25 5 1438
30 5 3273
35 5 642
40 5 824
45 15 613
60 30 215
90 60 57
But to find the mean (the average moment or leverage about $0$) you need to know the area of each bin (width times height) and the midpoint of each bin, in order to multiply these together to give the leverage
Bin-left Bin-width Bin-height Bin-area Bin-midpoint Bin-leverage
0 5 836 4180 2.5 10450.0
5 5 2737 13687 7.5 102652.5
10 5 3723 18618 12.5 232725.0
15 5 3926 19634 17.5 343595.0
20 5 3596 17981 22.5 404572.5
25 5 1438 7190 27.5 197725.0
30 5 3273 16369 32.5 531992.5
35 5 642 3212 37.5 120450.0
40 5 824 4122 42.5 175185.0
45 15 613 9200 52.5 483000.0
60 30 215 6461 75.0 484575.0
90 60 57 3435 120.0 412200.0
Adding up the areas to give $124089$ and the leverages to give $3488122.5$ and dividing the former by the latter gives a mean of about $28.2$
This might be a slight over estimate (a) because people tend to answer survey questions with round numbers so more to the left than to the right of these bins and (b) because among the extreme values on the right smaller values may be more likely than larger values, i.e. the bins may not actually be rectangles. Even ignoring those points, this calculated mean and the original guess of $25$ are not far apart
$endgroup$
add a comment |
$begingroup$
Taking an example from Wikipedia's histogram article, you might see something the following histogram
and you might guess a mean around $25$ (between the fifth and sixth bins) which has most of the data to the left, but counterbalanced by the more extreme values to the right
If you wanted the precise figure, you might look at the location and dimension of each bin. In this example, Wikipedia actually gives numbers, so let's use those
Bin-left Bin-width Bin-height
0 5 836
5 5 2737
10 5 3723
15 5 3926
20 5 3596
25 5 1438
30 5 3273
35 5 642
40 5 824
45 15 613
60 30 215
90 60 57
But to find the mean (the average moment or leverage about $0$) you need to know the area of each bin (width times height) and the midpoint of each bin, in order to multiply these together to give the leverage
Bin-left Bin-width Bin-height Bin-area Bin-midpoint Bin-leverage
0 5 836 4180 2.5 10450.0
5 5 2737 13687 7.5 102652.5
10 5 3723 18618 12.5 232725.0
15 5 3926 19634 17.5 343595.0
20 5 3596 17981 22.5 404572.5
25 5 1438 7190 27.5 197725.0
30 5 3273 16369 32.5 531992.5
35 5 642 3212 37.5 120450.0
40 5 824 4122 42.5 175185.0
45 15 613 9200 52.5 483000.0
60 30 215 6461 75.0 484575.0
90 60 57 3435 120.0 412200.0
Adding up the areas to give $124089$ and the leverages to give $3488122.5$ and dividing the former by the latter gives a mean of about $28.2$
This might be a slight over estimate (a) because people tend to answer survey questions with round numbers so more to the left than to the right of these bins and (b) because among the extreme values on the right smaller values may be more likely than larger values, i.e. the bins may not actually be rectangles. Even ignoring those points, this calculated mean and the original guess of $25$ are not far apart
$endgroup$
Taking an example from Wikipedia's histogram article, you might see something the following histogram
and you might guess a mean around $25$ (between the fifth and sixth bins) which has most of the data to the left, but counterbalanced by the more extreme values to the right
If you wanted the precise figure, you might look at the location and dimension of each bin. In this example, Wikipedia actually gives numbers, so let's use those
Bin-left Bin-width Bin-height
0 5 836
5 5 2737
10 5 3723
15 5 3926
20 5 3596
25 5 1438
30 5 3273
35 5 642
40 5 824
45 15 613
60 30 215
90 60 57
But to find the mean (the average moment or leverage about $0$) you need to know the area of each bin (width times height) and the midpoint of each bin, in order to multiply these together to give the leverage
Bin-left Bin-width Bin-height Bin-area Bin-midpoint Bin-leverage
0 5 836 4180 2.5 10450.0
5 5 2737 13687 7.5 102652.5
10 5 3723 18618 12.5 232725.0
15 5 3926 19634 17.5 343595.0
20 5 3596 17981 22.5 404572.5
25 5 1438 7190 27.5 197725.0
30 5 3273 16369 32.5 531992.5
35 5 642 3212 37.5 120450.0
40 5 824 4122 42.5 175185.0
45 15 613 9200 52.5 483000.0
60 30 215 6461 75.0 484575.0
90 60 57 3435 120.0 412200.0
Adding up the areas to give $124089$ and the leverages to give $3488122.5$ and dividing the former by the latter gives a mean of about $28.2$
This might be a slight over estimate (a) because people tend to answer survey questions with round numbers so more to the left than to the right of these bins and (b) because among the extreme values on the right smaller values may be more likely than larger values, i.e. the bins may not actually be rectangles. Even ignoring those points, this calculated mean and the original guess of $25$ are not far apart
answered Jan 14 at 1:22
HenryHenry
101k482170
101k482170
add a comment |
add a comment |
$begingroup$
Suppose (for example) that your histogram shows the weights of people in kilograms. The histogram has 3 columns -- one for people who weigh 50 to 60 kg, one for people who weigh 60 to 70 kg, and one for people who weigh 70 to 80 kg. Suppose the first column has height 2, the second column has height 3, and the third has height 1. In summary we have
Weight 50 to 60 -- column height = 2
Weight 60 to 70 -- column height = 3
Weight 70 to 80 -- column height = 1
If we don't have any further info, it's reasonable to assume that the two people in the 50-to-60 category both weigh 55 kg (the average of 50 and 60).
Continuing this approach, we assume that our 6 people have weights 55, 55, 65, 65, 65, 75.
I expect you know how to compute the mean of those 6 numbers.
$endgroup$
$begingroup$
Thank you for the answer!
$endgroup$
– 9766Joe
Jan 14 at 0:44
add a comment |
$begingroup$
Suppose (for example) that your histogram shows the weights of people in kilograms. The histogram has 3 columns -- one for people who weigh 50 to 60 kg, one for people who weigh 60 to 70 kg, and one for people who weigh 70 to 80 kg. Suppose the first column has height 2, the second column has height 3, and the third has height 1. In summary we have
Weight 50 to 60 -- column height = 2
Weight 60 to 70 -- column height = 3
Weight 70 to 80 -- column height = 1
If we don't have any further info, it's reasonable to assume that the two people in the 50-to-60 category both weigh 55 kg (the average of 50 and 60).
Continuing this approach, we assume that our 6 people have weights 55, 55, 65, 65, 65, 75.
I expect you know how to compute the mean of those 6 numbers.
$endgroup$
$begingroup$
Thank you for the answer!
$endgroup$
– 9766Joe
Jan 14 at 0:44
add a comment |
$begingroup$
Suppose (for example) that your histogram shows the weights of people in kilograms. The histogram has 3 columns -- one for people who weigh 50 to 60 kg, one for people who weigh 60 to 70 kg, and one for people who weigh 70 to 80 kg. Suppose the first column has height 2, the second column has height 3, and the third has height 1. In summary we have
Weight 50 to 60 -- column height = 2
Weight 60 to 70 -- column height = 3
Weight 70 to 80 -- column height = 1
If we don't have any further info, it's reasonable to assume that the two people in the 50-to-60 category both weigh 55 kg (the average of 50 and 60).
Continuing this approach, we assume that our 6 people have weights 55, 55, 65, 65, 65, 75.
I expect you know how to compute the mean of those 6 numbers.
$endgroup$
Suppose (for example) that your histogram shows the weights of people in kilograms. The histogram has 3 columns -- one for people who weigh 50 to 60 kg, one for people who weigh 60 to 70 kg, and one for people who weigh 70 to 80 kg. Suppose the first column has height 2, the second column has height 3, and the third has height 1. In summary we have
Weight 50 to 60 -- column height = 2
Weight 60 to 70 -- column height = 3
Weight 70 to 80 -- column height = 1
If we don't have any further info, it's reasonable to assume that the two people in the 50-to-60 category both weigh 55 kg (the average of 50 and 60).
Continuing this approach, we assume that our 6 people have weights 55, 55, 65, 65, 65, 75.
I expect you know how to compute the mean of those 6 numbers.
edited Jan 14 at 0:45
answered Jan 14 at 0:42
bubbabubba
30.8k33188
30.8k33188
$begingroup$
Thank you for the answer!
$endgroup$
– 9766Joe
Jan 14 at 0:44
add a comment |
$begingroup$
Thank you for the answer!
$endgroup$
– 9766Joe
Jan 14 at 0:44
$begingroup$
Thank you for the answer!
$endgroup$
– 9766Joe
Jan 14 at 0:44
$begingroup$
Thank you for the answer!
$endgroup$
– 9766Joe
Jan 14 at 0:44
add a comment |
$begingroup$
The same way you estimate them if the values were displayed normally rather than as a histogram. For mean and median you just get a feel for it and try to guesstimate a number "somewhere in the middle." For mode, that is simply the most frequently occurring number. Be aware though that unless the values are labeled in a histogram, mode can be quite tricky. Imagine a histogram with the values $4,5,6,100,200,200$. It might look like there are three fives at a glance rather than the three small numbers all being different which would make your guess at a mode incorrect.
$endgroup$
– JMoravitz
Jan 13 at 23:51
$begingroup$
cs.uni.edu/~campbell/stat/histrev2.html
$endgroup$
– D.B.
Jan 14 at 0:14