How to normalize data between 0 and 1?
up vote
2
down vote
favorite
I have seen the min-max normalization formula in several answers (e.g. [1], [2], [3]), where data is normalized into the interval $left[0,1 right]$.
However, is there a method to normalize data into the interval $left(0,1 right)$, i.e. excluding 0 and 1?
EDIT:
My data is a sample from a uniform distribution within the range $left[a,b right]$. I would like to normalize it into the interval $left(0,1 right)$ while remaining uniformly distributed.
dataset normalization
add a comment |
up vote
2
down vote
favorite
I have seen the min-max normalization formula in several answers (e.g. [1], [2], [3]), where data is normalized into the interval $left[0,1 right]$.
However, is there a method to normalize data into the interval $left(0,1 right)$, i.e. excluding 0 and 1?
EDIT:
My data is a sample from a uniform distribution within the range $left[a,b right]$. I would like to normalize it into the interval $left(0,1 right)$ while remaining uniformly distributed.
dataset normalization
2
$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47
Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have seen the min-max normalization formula in several answers (e.g. [1], [2], [3]), where data is normalized into the interval $left[0,1 right]$.
However, is there a method to normalize data into the interval $left(0,1 right)$, i.e. excluding 0 and 1?
EDIT:
My data is a sample from a uniform distribution within the range $left[a,b right]$. I would like to normalize it into the interval $left(0,1 right)$ while remaining uniformly distributed.
dataset normalization
I have seen the min-max normalization formula in several answers (e.g. [1], [2], [3]), where data is normalized into the interval $left[0,1 right]$.
However, is there a method to normalize data into the interval $left(0,1 right)$, i.e. excluding 0 and 1?
EDIT:
My data is a sample from a uniform distribution within the range $left[a,b right]$. I would like to normalize it into the interval $left(0,1 right)$ while remaining uniformly distributed.
dataset normalization
dataset normalization
edited Dec 4 at 16:00
asked Dec 4 at 15:30
skoestlmeier
12316
12316
2
$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47
Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01
add a comment |
2
$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47
Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01
2
2
$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47
$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47
Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01
Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01
add a comment |
3 Answers
3
active
oldest
votes
up vote
3
down vote
accepted
Using the property that the CDF is uniformly distributed on $[0,1]$, you can compute the empirical CDF for $x$. This is essentially the same as ranking the data and then rescaling by the number of elements $n$. To enforce the requirement that the scaled data exclude 0 and 1, you can deviate from the standard ECDF procedure and construct the scale so that the outputs are $frac{1}{n+1}, frac{2}{n+1},cdots, frac{n}{n+1}$, which is likewise uniform.
There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12
Does this have any particular name?
– Sycorax
Dec 6 at 13:42
Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49
add a comment |
up vote
4
down vote
A uniform distribution on $(a, b)$ is the same as a uniform distribution on $[a, b]$, since for any $X$ distributed uniformly on $[a, b]$, $P(X = a) = P(X = b) = 0$. So, just use the formulae for translating to $[0, 1]$. On the other hand, if your sample has a value equal to $a$ or $b$, then you can safely conclude that you don't actually have a continuous uniform distribution.
I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47
@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27
I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34
@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36
1
@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58
|
show 1 more comment
up vote
1
down vote
The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.
I am not sure of why you want to exclude $0$ and $1$, anyway one way would be to choose a new minimum and maximum values for the transformed variable, e.g. $[0+epsilon,1-epsilon]$. You can then transform the variable using
$$x' = epsilon + (1-2epsilon) cdot left(frac{x - min{x}}{max{x} - min{x}} right)$$
Another way could be, as suggested by Sycorax in his comment, to use a logistic transform
$$ x' = frac{1}{1 + exp(-x)} $$
This ensures that $forall x in mathbb{R} implies x' in (0,1)$.
However, depending on the original distribution of $x$, $x'$ might span only a limited range of the interval $(0,1)$, so you might want to try e.g. to standardize $x$ before applying the logistic transform.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f380276%2fhow-to-normalize-data-between-0-and-1%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
Using the property that the CDF is uniformly distributed on $[0,1]$, you can compute the empirical CDF for $x$. This is essentially the same as ranking the data and then rescaling by the number of elements $n$. To enforce the requirement that the scaled data exclude 0 and 1, you can deviate from the standard ECDF procedure and construct the scale so that the outputs are $frac{1}{n+1}, frac{2}{n+1},cdots, frac{n}{n+1}$, which is likewise uniform.
There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12
Does this have any particular name?
– Sycorax
Dec 6 at 13:42
Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49
add a comment |
up vote
3
down vote
accepted
Using the property that the CDF is uniformly distributed on $[0,1]$, you can compute the empirical CDF for $x$. This is essentially the same as ranking the data and then rescaling by the number of elements $n$. To enforce the requirement that the scaled data exclude 0 and 1, you can deviate from the standard ECDF procedure and construct the scale so that the outputs are $frac{1}{n+1}, frac{2}{n+1},cdots, frac{n}{n+1}$, which is likewise uniform.
There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12
Does this have any particular name?
– Sycorax
Dec 6 at 13:42
Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
Using the property that the CDF is uniformly distributed on $[0,1]$, you can compute the empirical CDF for $x$. This is essentially the same as ranking the data and then rescaling by the number of elements $n$. To enforce the requirement that the scaled data exclude 0 and 1, you can deviate from the standard ECDF procedure and construct the scale so that the outputs are $frac{1}{n+1}, frac{2}{n+1},cdots, frac{n}{n+1}$, which is likewise uniform.
Using the property that the CDF is uniformly distributed on $[0,1]$, you can compute the empirical CDF for $x$. This is essentially the same as ranking the data and then rescaling by the number of elements $n$. To enforce the requirement that the scaled data exclude 0 and 1, you can deviate from the standard ECDF procedure and construct the scale so that the outputs are $frac{1}{n+1}, frac{2}{n+1},cdots, frac{n}{n+1}$, which is likewise uniform.
edited Dec 5 at 15:08
answered Dec 4 at 16:07
Sycorax
38.3k997186
38.3k997186
There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12
Does this have any particular name?
– Sycorax
Dec 6 at 13:42
Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49
add a comment |
There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12
Does this have any particular name?
– Sycorax
Dec 6 at 13:42
Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49
There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12
There's a whole class of symmetric versions of your scaling procedure: $u_alpha(i) = frac{i-alpha}{n+1-2alpha}$ (with $0leqalphaleq 1$, of which the above has $alpha=0$. (There's also asymmetric ones which have uses in some applications)
– Glen_b♦
Dec 6 at 5:12
Does this have any particular name?
– Sycorax
Dec 6 at 13:42
Does this have any particular name?
– Sycorax
Dec 6 at 13:42
Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49
Several, I think but I can't recall any right now. It comes up in probability plotting. Blom 1958 "Statistical Estimates and Transformed Beta Variables" is the standard reference for this thing (and variations).
– Glen_b♦
Dec 7 at 8:49
add a comment |
up vote
4
down vote
A uniform distribution on $(a, b)$ is the same as a uniform distribution on $[a, b]$, since for any $X$ distributed uniformly on $[a, b]$, $P(X = a) = P(X = b) = 0$. So, just use the formulae for translating to $[0, 1]$. On the other hand, if your sample has a value equal to $a$ or $b$, then you can safely conclude that you don't actually have a continuous uniform distribution.
I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47
@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27
I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34
@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36
1
@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58
|
show 1 more comment
up vote
4
down vote
A uniform distribution on $(a, b)$ is the same as a uniform distribution on $[a, b]$, since for any $X$ distributed uniformly on $[a, b]$, $P(X = a) = P(X = b) = 0$. So, just use the formulae for translating to $[0, 1]$. On the other hand, if your sample has a value equal to $a$ or $b$, then you can safely conclude that you don't actually have a continuous uniform distribution.
I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47
@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27
I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34
@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36
1
@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58
|
show 1 more comment
up vote
4
down vote
up vote
4
down vote
A uniform distribution on $(a, b)$ is the same as a uniform distribution on $[a, b]$, since for any $X$ distributed uniformly on $[a, b]$, $P(X = a) = P(X = b) = 0$. So, just use the formulae for translating to $[0, 1]$. On the other hand, if your sample has a value equal to $a$ or $b$, then you can safely conclude that you don't actually have a continuous uniform distribution.
A uniform distribution on $(a, b)$ is the same as a uniform distribution on $[a, b]$, since for any $X$ distributed uniformly on $[a, b]$, $P(X = a) = P(X = b) = 0$. So, just use the formulae for translating to $[0, 1]$. On the other hand, if your sample has a value equal to $a$ or $b$, then you can safely conclude that you don't actually have a continuous uniform distribution.
edited Dec 4 at 19:35
answered Dec 4 at 16:11
Kodiologist
16.5k22953
16.5k22953
I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47
@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27
I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34
@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36
1
@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58
|
show 1 more comment
I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47
@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27
I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34
@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36
1
@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58
I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47
I don't agree with your latter statement. Following the same logic, you could exclude any data from ever being sampled from a uniform distribution.
– dedObed
Dec 4 at 19:47
@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27
@dedObed The argument works for any countable set of points, because any such set has Lebesgue measure zero, but not for uncountable sets.
– Kodiologist
Dec 4 at 20:27
I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34
I agree that a uniform distribution on (a, b) is the same as a uniform on [a, b]. The claim I challenge is "if your sample has a value equal to a or b [...] you don't actually have a continuous uniform distribution."
– dedObed
Dec 4 at 20:34
@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36
@dedObed I know. I'm saying that the argument works because ${a, b}$, the set of just the two values $a$ and $b$, is countable. It wouldn't if you used a non-null set, which is what would be required to "follow the same logic" to "exclude any data from ever being sampled from a uniform distribution".
– Kodiologist
Dec 4 at 20:36
1
1
@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58
@dedObed I guess the chief thing to keep in mind is that continuous distributions are the sort of ethereal mathematical entities you can't get in real life. Computers fake a continuous uniform distribution with a discrete distribution that covers a large number of floating-point values. It's close enough for many applied purposes, but, e.g., a random float will always be rational, whereas a random sample from a continuous uniform distribution will be almost surely irrational.
– Kodiologist
Dec 4 at 21:58
|
show 1 more comment
up vote
1
down vote
The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.
I am not sure of why you want to exclude $0$ and $1$, anyway one way would be to choose a new minimum and maximum values for the transformed variable, e.g. $[0+epsilon,1-epsilon]$. You can then transform the variable using
$$x' = epsilon + (1-2epsilon) cdot left(frac{x - min{x}}{max{x} - min{x}} right)$$
Another way could be, as suggested by Sycorax in his comment, to use a logistic transform
$$ x' = frac{1}{1 + exp(-x)} $$
This ensures that $forall x in mathbb{R} implies x' in (0,1)$.
However, depending on the original distribution of $x$, $x'$ might span only a limited range of the interval $(0,1)$, so you might want to try e.g. to standardize $x$ before applying the logistic transform.
add a comment |
up vote
1
down vote
The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.
I am not sure of why you want to exclude $0$ and $1$, anyway one way would be to choose a new minimum and maximum values for the transformed variable, e.g. $[0+epsilon,1-epsilon]$. You can then transform the variable using
$$x' = epsilon + (1-2epsilon) cdot left(frac{x - min{x}}{max{x} - min{x}} right)$$
Another way could be, as suggested by Sycorax in his comment, to use a logistic transform
$$ x' = frac{1}{1 + exp(-x)} $$
This ensures that $forall x in mathbb{R} implies x' in (0,1)$.
However, depending on the original distribution of $x$, $x'$ might span only a limited range of the interval $(0,1)$, so you might want to try e.g. to standardize $x$ before applying the logistic transform.
add a comment |
up vote
1
down vote
up vote
1
down vote
The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.
I am not sure of why you want to exclude $0$ and $1$, anyway one way would be to choose a new minimum and maximum values for the transformed variable, e.g. $[0+epsilon,1-epsilon]$. You can then transform the variable using
$$x' = epsilon + (1-2epsilon) cdot left(frac{x - min{x}}{max{x} - min{x}} right)$$
Another way could be, as suggested by Sycorax in his comment, to use a logistic transform
$$ x' = frac{1}{1 + exp(-x)} $$
This ensures that $forall x in mathbb{R} implies x' in (0,1)$.
However, depending on the original distribution of $x$, $x'$ might span only a limited range of the interval $(0,1)$, so you might want to try e.g. to standardize $x$ before applying the logistic transform.
The formula $x' = frac{x - min{x}}{max{x} - min{x}}$ will normalize the values in $[0,1]$.
I am not sure of why you want to exclude $0$ and $1$, anyway one way would be to choose a new minimum and maximum values for the transformed variable, e.g. $[0+epsilon,1-epsilon]$. You can then transform the variable using
$$x' = epsilon + (1-2epsilon) cdot left(frac{x - min{x}}{max{x} - min{x}} right)$$
Another way could be, as suggested by Sycorax in his comment, to use a logistic transform
$$ x' = frac{1}{1 + exp(-x)} $$
This ensures that $forall x in mathbb{R} implies x' in (0,1)$.
However, depending on the original distribution of $x$, $x'$ might span only a limited range of the interval $(0,1)$, so you might want to try e.g. to standardize $x$ before applying the logistic transform.
answered Dec 4 at 16:01
matteo
1,371513
1,371513
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f380276%2fhow-to-normalize-data-between-0-and-1%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$$frac{1}{1 + exp(-x)} in (0,1)$$ for any $xin mathbb{R}$. Do you have some other requirements that would exclude this?
– Sycorax
Dec 4 at 15:47
Thanks @Sycorax, to clarify, i just edited my question to point out that my data sample should be uniformly distributed.
– skoestlmeier
Dec 4 at 16:01