Why does an empty string in Python sometimes take up 49 bytes and sometimes 51?












46















I tested sys.getsize('') and sys.getsize(' ') in three environments, and in two of them sys.getsize('') gives me 51 bytes (one byte more than the second) instead of 49 bytes:



Screenshots:



Win8 + Spyder + CPython 3.6:



sys.getsizeof('') == 49 and sys.getsizeof(' ') == 50



Win8 + Spyder + IPython 3.6:



sys.getsizeof('') == 51 and sys.getsizeof(' ') == 50



Win10 (VPN remote) + PyCharm + CPython 3.7:



sys.getsizeof('') == 51 and sys.getsizeof(' ') == 50



First edit



I did a second test in Python.exe instead of Spyder and PyCharm (These two are still showing 51), and everything seems to be good. Apparently I don't have the expertise to solve this problem so I'll leave it to you guys :)



Win10 + Python 3.7 console versus PyCharm using same interpreter:



enter image description here



Win8 + IPython 3.6 + Spyder using same interpreter:



enter image description here










share|improve this question




















  • 14





    My burning question is "why does it matter?". But anyway, Spyder will also be throwing that into a shared namespace

    – roganjosh
    Dec 22 '18 at 23:06






  • 6





    @roganjosh Actually I think it doesn't matter because my job as a data analyst doesn't ask me to dig deep into the object model, but I'm scratching my head to understand the why behind this. I wish I have other OS e.g. Linux to test this. BTW does this have something to do with the "shared namespace" you said?

    – Nicholas Humphrey
    Dec 22 '18 at 23:09






  • 2





    My job is also data scientist/data analyst. This behaviour is inconsequential, but I don't want to invalidate your question (curiosity is fine). Spyder has a complex namespace, you must have observed how things are available in the console from your main scripts...

    – roganjosh
    Dec 22 '18 at 23:12






  • 5





    @AndreyTyukin No I just want to see if someone else has encountered this weird thing before, and more importantly, if an empty string does have 1 more byte than a string with one char, it means that my understanding of the string object could be totally wrong. If you think this is normal, then sorry, because I'm not professional software developer and this is indeed weird to me. For now I'm settled with this issue as a second test with Python.exe console shows 49.

    – Nicholas Humphrey
    Dec 22 '18 at 23:43






  • 2





    The most likely candidate seems to be that strings cache a version encoded with UTF-8 when first required.

    – Davis Herring
    Dec 23 '18 at 0:20
















46















I tested sys.getsize('') and sys.getsize(' ') in three environments, and in two of them sys.getsize('') gives me 51 bytes (one byte more than the second) instead of 49 bytes:



Screenshots:



Win8 + Spyder + CPython 3.6:



sys.getsizeof('') == 49 and sys.getsizeof(' ') == 50



Win8 + Spyder + IPython 3.6:



sys.getsizeof('') == 51 and sys.getsizeof(' ') == 50



Win10 (VPN remote) + PyCharm + CPython 3.7:



sys.getsizeof('') == 51 and sys.getsizeof(' ') == 50



First edit



I did a second test in Python.exe instead of Spyder and PyCharm (These two are still showing 51), and everything seems to be good. Apparently I don't have the expertise to solve this problem so I'll leave it to you guys :)



Win10 + Python 3.7 console versus PyCharm using same interpreter:



enter image description here



Win8 + IPython 3.6 + Spyder using same interpreter:



enter image description here










share|improve this question




















  • 14





    My burning question is "why does it matter?". But anyway, Spyder will also be throwing that into a shared namespace

    – roganjosh
    Dec 22 '18 at 23:06






  • 6





    @roganjosh Actually I think it doesn't matter because my job as a data analyst doesn't ask me to dig deep into the object model, but I'm scratching my head to understand the why behind this. I wish I have other OS e.g. Linux to test this. BTW does this have something to do with the "shared namespace" you said?

    – Nicholas Humphrey
    Dec 22 '18 at 23:09






  • 2





    My job is also data scientist/data analyst. This behaviour is inconsequential, but I don't want to invalidate your question (curiosity is fine). Spyder has a complex namespace, you must have observed how things are available in the console from your main scripts...

    – roganjosh
    Dec 22 '18 at 23:12






  • 5





    @AndreyTyukin No I just want to see if someone else has encountered this weird thing before, and more importantly, if an empty string does have 1 more byte than a string with one char, it means that my understanding of the string object could be totally wrong. If you think this is normal, then sorry, because I'm not professional software developer and this is indeed weird to me. For now I'm settled with this issue as a second test with Python.exe console shows 49.

    – Nicholas Humphrey
    Dec 22 '18 at 23:43






  • 2





    The most likely candidate seems to be that strings cache a version encoded with UTF-8 when first required.

    – Davis Herring
    Dec 23 '18 at 0:20














46












46








46


8






I tested sys.getsize('') and sys.getsize(' ') in three environments, and in two of them sys.getsize('') gives me 51 bytes (one byte more than the second) instead of 49 bytes:



Screenshots:



Win8 + Spyder + CPython 3.6:



sys.getsizeof('') == 49 and sys.getsizeof(' ') == 50



Win8 + Spyder + IPython 3.6:



sys.getsizeof('') == 51 and sys.getsizeof(' ') == 50



Win10 (VPN remote) + PyCharm + CPython 3.7:



sys.getsizeof('') == 51 and sys.getsizeof(' ') == 50



First edit



I did a second test in Python.exe instead of Spyder and PyCharm (These two are still showing 51), and everything seems to be good. Apparently I don't have the expertise to solve this problem so I'll leave it to you guys :)



Win10 + Python 3.7 console versus PyCharm using same interpreter:



enter image description here



Win8 + IPython 3.6 + Spyder using same interpreter:



enter image description here










share|improve this question
















I tested sys.getsize('') and sys.getsize(' ') in three environments, and in two of them sys.getsize('') gives me 51 bytes (one byte more than the second) instead of 49 bytes:



Screenshots:



Win8 + Spyder + CPython 3.6:



sys.getsizeof('') == 49 and sys.getsizeof(' ') == 50



Win8 + Spyder + IPython 3.6:



sys.getsizeof('') == 51 and sys.getsizeof(' ') == 50



Win10 (VPN remote) + PyCharm + CPython 3.7:



sys.getsizeof('') == 51 and sys.getsizeof(' ') == 50



First edit



I did a second test in Python.exe instead of Spyder and PyCharm (These two are still showing 51), and everything seems to be good. Apparently I don't have the expertise to solve this problem so I'll leave it to you guys :)



Win10 + Python 3.7 console versus PyCharm using same interpreter:



enter image description here



Win8 + IPython 3.6 + Spyder using same interpreter:



enter image description here







python






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 22 '18 at 23:38







Nicholas Humphrey

















asked Dec 22 '18 at 23:01









Nicholas HumphreyNicholas Humphrey

548819




548819








  • 14





    My burning question is "why does it matter?". But anyway, Spyder will also be throwing that into a shared namespace

    – roganjosh
    Dec 22 '18 at 23:06






  • 6





    @roganjosh Actually I think it doesn't matter because my job as a data analyst doesn't ask me to dig deep into the object model, but I'm scratching my head to understand the why behind this. I wish I have other OS e.g. Linux to test this. BTW does this have something to do with the "shared namespace" you said?

    – Nicholas Humphrey
    Dec 22 '18 at 23:09






  • 2





    My job is also data scientist/data analyst. This behaviour is inconsequential, but I don't want to invalidate your question (curiosity is fine). Spyder has a complex namespace, you must have observed how things are available in the console from your main scripts...

    – roganjosh
    Dec 22 '18 at 23:12






  • 5





    @AndreyTyukin No I just want to see if someone else has encountered this weird thing before, and more importantly, if an empty string does have 1 more byte than a string with one char, it means that my understanding of the string object could be totally wrong. If you think this is normal, then sorry, because I'm not professional software developer and this is indeed weird to me. For now I'm settled with this issue as a second test with Python.exe console shows 49.

    – Nicholas Humphrey
    Dec 22 '18 at 23:43






  • 2





    The most likely candidate seems to be that strings cache a version encoded with UTF-8 when first required.

    – Davis Herring
    Dec 23 '18 at 0:20














  • 14





    My burning question is "why does it matter?". But anyway, Spyder will also be throwing that into a shared namespace

    – roganjosh
    Dec 22 '18 at 23:06






  • 6





    @roganjosh Actually I think it doesn't matter because my job as a data analyst doesn't ask me to dig deep into the object model, but I'm scratching my head to understand the why behind this. I wish I have other OS e.g. Linux to test this. BTW does this have something to do with the "shared namespace" you said?

    – Nicholas Humphrey
    Dec 22 '18 at 23:09






  • 2





    My job is also data scientist/data analyst. This behaviour is inconsequential, but I don't want to invalidate your question (curiosity is fine). Spyder has a complex namespace, you must have observed how things are available in the console from your main scripts...

    – roganjosh
    Dec 22 '18 at 23:12






  • 5





    @AndreyTyukin No I just want to see if someone else has encountered this weird thing before, and more importantly, if an empty string does have 1 more byte than a string with one char, it means that my understanding of the string object could be totally wrong. If you think this is normal, then sorry, because I'm not professional software developer and this is indeed weird to me. For now I'm settled with this issue as a second test with Python.exe console shows 49.

    – Nicholas Humphrey
    Dec 22 '18 at 23:43






  • 2





    The most likely candidate seems to be that strings cache a version encoded with UTF-8 when first required.

    – Davis Herring
    Dec 23 '18 at 0:20








14




14





My burning question is "why does it matter?". But anyway, Spyder will also be throwing that into a shared namespace

– roganjosh
Dec 22 '18 at 23:06





My burning question is "why does it matter?". But anyway, Spyder will also be throwing that into a shared namespace

– roganjosh
Dec 22 '18 at 23:06




6




6





@roganjosh Actually I think it doesn't matter because my job as a data analyst doesn't ask me to dig deep into the object model, but I'm scratching my head to understand the why behind this. I wish I have other OS e.g. Linux to test this. BTW does this have something to do with the "shared namespace" you said?

– Nicholas Humphrey
Dec 22 '18 at 23:09





@roganjosh Actually I think it doesn't matter because my job as a data analyst doesn't ask me to dig deep into the object model, but I'm scratching my head to understand the why behind this. I wish I have other OS e.g. Linux to test this. BTW does this have something to do with the "shared namespace" you said?

– Nicholas Humphrey
Dec 22 '18 at 23:09




2




2





My job is also data scientist/data analyst. This behaviour is inconsequential, but I don't want to invalidate your question (curiosity is fine). Spyder has a complex namespace, you must have observed how things are available in the console from your main scripts...

– roganjosh
Dec 22 '18 at 23:12





My job is also data scientist/data analyst. This behaviour is inconsequential, but I don't want to invalidate your question (curiosity is fine). Spyder has a complex namespace, you must have observed how things are available in the console from your main scripts...

– roganjosh
Dec 22 '18 at 23:12




5




5





@AndreyTyukin No I just want to see if someone else has encountered this weird thing before, and more importantly, if an empty string does have 1 more byte than a string with one char, it means that my understanding of the string object could be totally wrong. If you think this is normal, then sorry, because I'm not professional software developer and this is indeed weird to me. For now I'm settled with this issue as a second test with Python.exe console shows 49.

– Nicholas Humphrey
Dec 22 '18 at 23:43





@AndreyTyukin No I just want to see if someone else has encountered this weird thing before, and more importantly, if an empty string does have 1 more byte than a string with one char, it means that my understanding of the string object could be totally wrong. If you think this is normal, then sorry, because I'm not professional software developer and this is indeed weird to me. For now I'm settled with this issue as a second test with Python.exe console shows 49.

– Nicholas Humphrey
Dec 22 '18 at 23:43




2




2





The most likely candidate seems to be that strings cache a version encoded with UTF-8 when first required.

– Davis Herring
Dec 23 '18 at 0:20





The most likely candidate seems to be that strings cache a version encoded with UTF-8 when first required.

– Davis Herring
Dec 23 '18 at 0:20












2 Answers
2






active

oldest

votes


















30














This sounds like something is retrieving the wchar representation of the string object. As of CPython 3.7, the way the CPython Unicode representation works out, an empty string is normally stored in "compact ASCII" representation, and the base data and padding for a compact ASCII string on a 64-bit build works out to 48 bytes, plus one byte of string data (just the null terminator). You can see the relevant header file here.



For now (this is scheduled for removal in 4.0), there is also an option to retrieve a wchar_t representation of a string. On a platform with 2-byte wchar_t, the wchar representation of an empty string is 2 bytes (just the null terminator again). The wchar representation is cached on the string on first access, and str.__sizeof__ accounts for this extra data when it exists, resulting in a 51-byte total.






share|improve this answer





















  • 2





    the getsizeof() does refer to __sizeof__ internally. This is the correct answer

    – Abhishek Dujari
    Dec 23 '18 at 4:26













  • Thanks! Although I don't get the full pic this seems to explain the source of 51-byte. Just curious, why does Spyder with CPython give me 49-byte but shows 51-byte for IPython on the same PC? I figured from your answer that it has something to do with the size of wchar, and in turn the IDEs get 2 bytes because it's OS-specified, but I think it should be the same for all interpreters? Anyway I may misunderstand your answer...

    – Nicholas Humphrey
    Dec 23 '18 at 4:56






  • 2





    @NicholasHumphrey: Something is retrieving the wchar representation in your IPython tests. (Also, your IPython tests are also using CPython; CPython is the interpreter implementation IPython runs on.)

    – user2357112
    Dec 23 '18 at 4:59






  • 9





    This is largely unrelated to the question, but seeing a reference to a "[Python] 4.0" is giving me anxiety...

    – Mike Caron
    Dec 23 '18 at 5:25






  • 1





    @MikeCaron and others: fear not. References to '4.0' means 'some future release after 2.7 support ends (2020 Jan). Some post-deprecation removals have been delayed to make migration easier for those who prefer smaller steps. Things deprecated in 3.3, about 6 years ago, might otherwise have gone away in 3.5. We no longer allow new references to a fictional '4.0'. I just suggested in bugs.python.org/issue35616 that we 'backport' this policy to older notices, precisely to avoid 'anxieties' that no one needs.

    – Terry Jan Reedy
    Dec 30 '18 at 4:59





















5














https://docs.python.org/3.5/library/sys.html#sys.getsizeof



sys is system specific so it can easily differ. This is often overlooked by everyone. All system specific stuff in python has been dumped in the sys package for years. For e.g sys.getwindowsversion() is not portable by definition but it's there. It like the bottomless pit of rejects in the perfect world of cross platform coding. What you see is one of the interesting nuggets of Python.



from getsizeof docs:




Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.




When Garbage collection is in use the OS will add those extra bits. If you read Python and GC Q & A When are objects garbage collected in python? the folks have gone into excruciating detail expounding the GC and how it will affect the memory/refcount and bits blah blah.



I hope that explains where this coming from. If you don't use system level attributes but more pythonic attributes then you will get consistent sizes.






share|improve this answer





















  • 3





    It's not GC data. String objects are never tracked by the GC; they don't have that data. Also, the same objects would have GC data on all the configurations the questioner tested.

    – user2357112
    Dec 23 '18 at 4:21






  • 1





    Then I stand to be corrected. it may not be GC. However the difference in representation still applies and is system specific. It could be OS+runtime

    – Abhishek Dujari
    Dec 23 '18 at 4:24











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53899931%2fwhy-does-an-empty-string-in-python-sometimes-take-up-49-bytes-and-sometimes-51%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









30














This sounds like something is retrieving the wchar representation of the string object. As of CPython 3.7, the way the CPython Unicode representation works out, an empty string is normally stored in "compact ASCII" representation, and the base data and padding for a compact ASCII string on a 64-bit build works out to 48 bytes, plus one byte of string data (just the null terminator). You can see the relevant header file here.



For now (this is scheduled for removal in 4.0), there is also an option to retrieve a wchar_t representation of a string. On a platform with 2-byte wchar_t, the wchar representation of an empty string is 2 bytes (just the null terminator again). The wchar representation is cached on the string on first access, and str.__sizeof__ accounts for this extra data when it exists, resulting in a 51-byte total.






share|improve this answer





















  • 2





    the getsizeof() does refer to __sizeof__ internally. This is the correct answer

    – Abhishek Dujari
    Dec 23 '18 at 4:26













  • Thanks! Although I don't get the full pic this seems to explain the source of 51-byte. Just curious, why does Spyder with CPython give me 49-byte but shows 51-byte for IPython on the same PC? I figured from your answer that it has something to do with the size of wchar, and in turn the IDEs get 2 bytes because it's OS-specified, but I think it should be the same for all interpreters? Anyway I may misunderstand your answer...

    – Nicholas Humphrey
    Dec 23 '18 at 4:56






  • 2





    @NicholasHumphrey: Something is retrieving the wchar representation in your IPython tests. (Also, your IPython tests are also using CPython; CPython is the interpreter implementation IPython runs on.)

    – user2357112
    Dec 23 '18 at 4:59






  • 9





    This is largely unrelated to the question, but seeing a reference to a "[Python] 4.0" is giving me anxiety...

    – Mike Caron
    Dec 23 '18 at 5:25






  • 1





    @MikeCaron and others: fear not. References to '4.0' means 'some future release after 2.7 support ends (2020 Jan). Some post-deprecation removals have been delayed to make migration easier for those who prefer smaller steps. Things deprecated in 3.3, about 6 years ago, might otherwise have gone away in 3.5. We no longer allow new references to a fictional '4.0'. I just suggested in bugs.python.org/issue35616 that we 'backport' this policy to older notices, precisely to avoid 'anxieties' that no one needs.

    – Terry Jan Reedy
    Dec 30 '18 at 4:59


















30














This sounds like something is retrieving the wchar representation of the string object. As of CPython 3.7, the way the CPython Unicode representation works out, an empty string is normally stored in "compact ASCII" representation, and the base data and padding for a compact ASCII string on a 64-bit build works out to 48 bytes, plus one byte of string data (just the null terminator). You can see the relevant header file here.



For now (this is scheduled for removal in 4.0), there is also an option to retrieve a wchar_t representation of a string. On a platform with 2-byte wchar_t, the wchar representation of an empty string is 2 bytes (just the null terminator again). The wchar representation is cached on the string on first access, and str.__sizeof__ accounts for this extra data when it exists, resulting in a 51-byte total.






share|improve this answer





















  • 2





    the getsizeof() does refer to __sizeof__ internally. This is the correct answer

    – Abhishek Dujari
    Dec 23 '18 at 4:26













  • Thanks! Although I don't get the full pic this seems to explain the source of 51-byte. Just curious, why does Spyder with CPython give me 49-byte but shows 51-byte for IPython on the same PC? I figured from your answer that it has something to do with the size of wchar, and in turn the IDEs get 2 bytes because it's OS-specified, but I think it should be the same for all interpreters? Anyway I may misunderstand your answer...

    – Nicholas Humphrey
    Dec 23 '18 at 4:56






  • 2





    @NicholasHumphrey: Something is retrieving the wchar representation in your IPython tests. (Also, your IPython tests are also using CPython; CPython is the interpreter implementation IPython runs on.)

    – user2357112
    Dec 23 '18 at 4:59






  • 9





    This is largely unrelated to the question, but seeing a reference to a "[Python] 4.0" is giving me anxiety...

    – Mike Caron
    Dec 23 '18 at 5:25






  • 1





    @MikeCaron and others: fear not. References to '4.0' means 'some future release after 2.7 support ends (2020 Jan). Some post-deprecation removals have been delayed to make migration easier for those who prefer smaller steps. Things deprecated in 3.3, about 6 years ago, might otherwise have gone away in 3.5. We no longer allow new references to a fictional '4.0'. I just suggested in bugs.python.org/issue35616 that we 'backport' this policy to older notices, precisely to avoid 'anxieties' that no one needs.

    – Terry Jan Reedy
    Dec 30 '18 at 4:59
















30












30








30







This sounds like something is retrieving the wchar representation of the string object. As of CPython 3.7, the way the CPython Unicode representation works out, an empty string is normally stored in "compact ASCII" representation, and the base data and padding for a compact ASCII string on a 64-bit build works out to 48 bytes, plus one byte of string data (just the null terminator). You can see the relevant header file here.



For now (this is scheduled for removal in 4.0), there is also an option to retrieve a wchar_t representation of a string. On a platform with 2-byte wchar_t, the wchar representation of an empty string is 2 bytes (just the null terminator again). The wchar representation is cached on the string on first access, and str.__sizeof__ accounts for this extra data when it exists, resulting in a 51-byte total.






share|improve this answer















This sounds like something is retrieving the wchar representation of the string object. As of CPython 3.7, the way the CPython Unicode representation works out, an empty string is normally stored in "compact ASCII" representation, and the base data and padding for a compact ASCII string on a 64-bit build works out to 48 bytes, plus one byte of string data (just the null terminator). You can see the relevant header file here.



For now (this is scheduled for removal in 4.0), there is also an option to retrieve a wchar_t representation of a string. On a platform with 2-byte wchar_t, the wchar representation of an empty string is 2 bytes (just the null terminator again). The wchar representation is cached on the string on first access, and str.__sizeof__ accounts for this extra data when it exists, resulting in a 51-byte total.







share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 23 '18 at 11:24

























answered Dec 23 '18 at 4:19









user2357112user2357112

154k12162255




154k12162255








  • 2





    the getsizeof() does refer to __sizeof__ internally. This is the correct answer

    – Abhishek Dujari
    Dec 23 '18 at 4:26













  • Thanks! Although I don't get the full pic this seems to explain the source of 51-byte. Just curious, why does Spyder with CPython give me 49-byte but shows 51-byte for IPython on the same PC? I figured from your answer that it has something to do with the size of wchar, and in turn the IDEs get 2 bytes because it's OS-specified, but I think it should be the same for all interpreters? Anyway I may misunderstand your answer...

    – Nicholas Humphrey
    Dec 23 '18 at 4:56






  • 2





    @NicholasHumphrey: Something is retrieving the wchar representation in your IPython tests. (Also, your IPython tests are also using CPython; CPython is the interpreter implementation IPython runs on.)

    – user2357112
    Dec 23 '18 at 4:59






  • 9





    This is largely unrelated to the question, but seeing a reference to a "[Python] 4.0" is giving me anxiety...

    – Mike Caron
    Dec 23 '18 at 5:25






  • 1





    @MikeCaron and others: fear not. References to '4.0' means 'some future release after 2.7 support ends (2020 Jan). Some post-deprecation removals have been delayed to make migration easier for those who prefer smaller steps. Things deprecated in 3.3, about 6 years ago, might otherwise have gone away in 3.5. We no longer allow new references to a fictional '4.0'. I just suggested in bugs.python.org/issue35616 that we 'backport' this policy to older notices, precisely to avoid 'anxieties' that no one needs.

    – Terry Jan Reedy
    Dec 30 '18 at 4:59
















  • 2





    the getsizeof() does refer to __sizeof__ internally. This is the correct answer

    – Abhishek Dujari
    Dec 23 '18 at 4:26













  • Thanks! Although I don't get the full pic this seems to explain the source of 51-byte. Just curious, why does Spyder with CPython give me 49-byte but shows 51-byte for IPython on the same PC? I figured from your answer that it has something to do with the size of wchar, and in turn the IDEs get 2 bytes because it's OS-specified, but I think it should be the same for all interpreters? Anyway I may misunderstand your answer...

    – Nicholas Humphrey
    Dec 23 '18 at 4:56






  • 2





    @NicholasHumphrey: Something is retrieving the wchar representation in your IPython tests. (Also, your IPython tests are also using CPython; CPython is the interpreter implementation IPython runs on.)

    – user2357112
    Dec 23 '18 at 4:59






  • 9





    This is largely unrelated to the question, but seeing a reference to a "[Python] 4.0" is giving me anxiety...

    – Mike Caron
    Dec 23 '18 at 5:25






  • 1





    @MikeCaron and others: fear not. References to '4.0' means 'some future release after 2.7 support ends (2020 Jan). Some post-deprecation removals have been delayed to make migration easier for those who prefer smaller steps. Things deprecated in 3.3, about 6 years ago, might otherwise have gone away in 3.5. We no longer allow new references to a fictional '4.0'. I just suggested in bugs.python.org/issue35616 that we 'backport' this policy to older notices, precisely to avoid 'anxieties' that no one needs.

    – Terry Jan Reedy
    Dec 30 '18 at 4:59










2




2





the getsizeof() does refer to __sizeof__ internally. This is the correct answer

– Abhishek Dujari
Dec 23 '18 at 4:26







the getsizeof() does refer to __sizeof__ internally. This is the correct answer

– Abhishek Dujari
Dec 23 '18 at 4:26















Thanks! Although I don't get the full pic this seems to explain the source of 51-byte. Just curious, why does Spyder with CPython give me 49-byte but shows 51-byte for IPython on the same PC? I figured from your answer that it has something to do with the size of wchar, and in turn the IDEs get 2 bytes because it's OS-specified, but I think it should be the same for all interpreters? Anyway I may misunderstand your answer...

– Nicholas Humphrey
Dec 23 '18 at 4:56





Thanks! Although I don't get the full pic this seems to explain the source of 51-byte. Just curious, why does Spyder with CPython give me 49-byte but shows 51-byte for IPython on the same PC? I figured from your answer that it has something to do with the size of wchar, and in turn the IDEs get 2 bytes because it's OS-specified, but I think it should be the same for all interpreters? Anyway I may misunderstand your answer...

– Nicholas Humphrey
Dec 23 '18 at 4:56




2




2





@NicholasHumphrey: Something is retrieving the wchar representation in your IPython tests. (Also, your IPython tests are also using CPython; CPython is the interpreter implementation IPython runs on.)

– user2357112
Dec 23 '18 at 4:59





@NicholasHumphrey: Something is retrieving the wchar representation in your IPython tests. (Also, your IPython tests are also using CPython; CPython is the interpreter implementation IPython runs on.)

– user2357112
Dec 23 '18 at 4:59




9




9





This is largely unrelated to the question, but seeing a reference to a "[Python] 4.0" is giving me anxiety...

– Mike Caron
Dec 23 '18 at 5:25





This is largely unrelated to the question, but seeing a reference to a "[Python] 4.0" is giving me anxiety...

– Mike Caron
Dec 23 '18 at 5:25




1




1





@MikeCaron and others: fear not. References to '4.0' means 'some future release after 2.7 support ends (2020 Jan). Some post-deprecation removals have been delayed to make migration easier for those who prefer smaller steps. Things deprecated in 3.3, about 6 years ago, might otherwise have gone away in 3.5. We no longer allow new references to a fictional '4.0'. I just suggested in bugs.python.org/issue35616 that we 'backport' this policy to older notices, precisely to avoid 'anxieties' that no one needs.

– Terry Jan Reedy
Dec 30 '18 at 4:59







@MikeCaron and others: fear not. References to '4.0' means 'some future release after 2.7 support ends (2020 Jan). Some post-deprecation removals have been delayed to make migration easier for those who prefer smaller steps. Things deprecated in 3.3, about 6 years ago, might otherwise have gone away in 3.5. We no longer allow new references to a fictional '4.0'. I just suggested in bugs.python.org/issue35616 that we 'backport' this policy to older notices, precisely to avoid 'anxieties' that no one needs.

– Terry Jan Reedy
Dec 30 '18 at 4:59















5














https://docs.python.org/3.5/library/sys.html#sys.getsizeof



sys is system specific so it can easily differ. This is often overlooked by everyone. All system specific stuff in python has been dumped in the sys package for years. For e.g sys.getwindowsversion() is not portable by definition but it's there. It like the bottomless pit of rejects in the perfect world of cross platform coding. What you see is one of the interesting nuggets of Python.



from getsizeof docs:




Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.




When Garbage collection is in use the OS will add those extra bits. If you read Python and GC Q & A When are objects garbage collected in python? the folks have gone into excruciating detail expounding the GC and how it will affect the memory/refcount and bits blah blah.



I hope that explains where this coming from. If you don't use system level attributes but more pythonic attributes then you will get consistent sizes.






share|improve this answer





















  • 3





    It's not GC data. String objects are never tracked by the GC; they don't have that data. Also, the same objects would have GC data on all the configurations the questioner tested.

    – user2357112
    Dec 23 '18 at 4:21






  • 1





    Then I stand to be corrected. it may not be GC. However the difference in representation still applies and is system specific. It could be OS+runtime

    – Abhishek Dujari
    Dec 23 '18 at 4:24
















5














https://docs.python.org/3.5/library/sys.html#sys.getsizeof



sys is system specific so it can easily differ. This is often overlooked by everyone. All system specific stuff in python has been dumped in the sys package for years. For e.g sys.getwindowsversion() is not portable by definition but it's there. It like the bottomless pit of rejects in the perfect world of cross platform coding. What you see is one of the interesting nuggets of Python.



from getsizeof docs:




Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.




When Garbage collection is in use the OS will add those extra bits. If you read Python and GC Q & A When are objects garbage collected in python? the folks have gone into excruciating detail expounding the GC and how it will affect the memory/refcount and bits blah blah.



I hope that explains where this coming from. If you don't use system level attributes but more pythonic attributes then you will get consistent sizes.






share|improve this answer





















  • 3





    It's not GC data. String objects are never tracked by the GC; they don't have that data. Also, the same objects would have GC data on all the configurations the questioner tested.

    – user2357112
    Dec 23 '18 at 4:21






  • 1





    Then I stand to be corrected. it may not be GC. However the difference in representation still applies and is system specific. It could be OS+runtime

    – Abhishek Dujari
    Dec 23 '18 at 4:24














5












5








5







https://docs.python.org/3.5/library/sys.html#sys.getsizeof



sys is system specific so it can easily differ. This is often overlooked by everyone. All system specific stuff in python has been dumped in the sys package for years. For e.g sys.getwindowsversion() is not portable by definition but it's there. It like the bottomless pit of rejects in the perfect world of cross platform coding. What you see is one of the interesting nuggets of Python.



from getsizeof docs:




Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.




When Garbage collection is in use the OS will add those extra bits. If you read Python and GC Q & A When are objects garbage collected in python? the folks have gone into excruciating detail expounding the GC and how it will affect the memory/refcount and bits blah blah.



I hope that explains where this coming from. If you don't use system level attributes but more pythonic attributes then you will get consistent sizes.






share|improve this answer















https://docs.python.org/3.5/library/sys.html#sys.getsizeof



sys is system specific so it can easily differ. This is often overlooked by everyone. All system specific stuff in python has been dumped in the sys package for years. For e.g sys.getwindowsversion() is not portable by definition but it's there. It like the bottomless pit of rejects in the perfect world of cross platform coding. What you see is one of the interesting nuggets of Python.



from getsizeof docs:




Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.




When Garbage collection is in use the OS will add those extra bits. If you read Python and GC Q & A When are objects garbage collected in python? the folks have gone into excruciating detail expounding the GC and how it will affect the memory/refcount and bits blah blah.



I hope that explains where this coming from. If you don't use system level attributes but more pythonic attributes then you will get consistent sizes.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 4 at 10:47









TrebuchetMS

2,5431923




2,5431923










answered Dec 23 '18 at 4:13









Abhishek DujariAbhishek Dujari

1,2872540




1,2872540








  • 3





    It's not GC data. String objects are never tracked by the GC; they don't have that data. Also, the same objects would have GC data on all the configurations the questioner tested.

    – user2357112
    Dec 23 '18 at 4:21






  • 1





    Then I stand to be corrected. it may not be GC. However the difference in representation still applies and is system specific. It could be OS+runtime

    – Abhishek Dujari
    Dec 23 '18 at 4:24














  • 3





    It's not GC data. String objects are never tracked by the GC; they don't have that data. Also, the same objects would have GC data on all the configurations the questioner tested.

    – user2357112
    Dec 23 '18 at 4:21






  • 1





    Then I stand to be corrected. it may not be GC. However the difference in representation still applies and is system specific. It could be OS+runtime

    – Abhishek Dujari
    Dec 23 '18 at 4:24








3




3





It's not GC data. String objects are never tracked by the GC; they don't have that data. Also, the same objects would have GC data on all the configurations the questioner tested.

– user2357112
Dec 23 '18 at 4:21





It's not GC data. String objects are never tracked by the GC; they don't have that data. Also, the same objects would have GC data on all the configurations the questioner tested.

– user2357112
Dec 23 '18 at 4:21




1




1





Then I stand to be corrected. it may not be GC. However the difference in representation still applies and is system specific. It could be OS+runtime

– Abhishek Dujari
Dec 23 '18 at 4:24





Then I stand to be corrected. it may not be GC. However the difference in representation still applies and is system specific. It could be OS+runtime

– Abhishek Dujari
Dec 23 '18 at 4:24


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53899931%2fwhy-does-an-empty-string-in-python-sometimes-take-up-49-bytes-and-sometimes-51%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Bressuire

Cabo Verde

Gyllenstierna