How to remove duplicate lines

I am trying to create a simple program that removes duplicate lines from a file. However, I am stuck. My goal is to ultimately remove all except 1 duplicate line, different from the suggested duplicate. So, I still have that data. I would also like to make it so, it takes in the same filename and outputs the same filename. When I tried to make the filenames both the same, it just outputs an empty file.

input_file = "input.txt"

output_file = "input.txt"



seen_lines = set()

outfile = open(output_file, "w")



for line in open(input_file, "r"):

    if line not in seen_lines:

        outfile.write(line)

        seen_lines.add(line)



outfile.close()

input.txt

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Keep the change ya filthy animal

Did someone say peanut butter?

Did someone say peanut butter?

Keep the change ya filthy animal

Expected output

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Did someone say peanut butter?

edited Dec 29 '18 at 23:46

Mad Physicist

37.7k1674106

asked Dec 29 '18 at 23:15

Mark

634

2

You open the file twice, since input_file and output_file are the same. The second time you open as read, which is where I think your problem is. So you won't be able to write.

– busybear
Dec 29 '18 at 23:23

@busybear Yes. Open your file as r+ to read and write to the file at the same time (they will both work).

– Ethan K888
Dec 29 '18 at 23:24

Possible duplicate of How might I remove duplicate lines from a file?

– glennv
Dec 29 '18 at 23:36

add a comment |

input_file = "input.txt"

output_file = "input.txt"



seen_lines = set()

outfile = open(output_file, "w")



for line in open(input_file, "r"):

    if line not in seen_lines:

        outfile.write(line)

        seen_lines.add(line)



outfile.close()

input.txt

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Keep the change ya filthy animal

Did someone say peanut butter?

Did someone say peanut butter?

Keep the change ya filthy animal

Expected output

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Did someone say peanut butter?

edited Dec 29 '18 at 23:46

Mad Physicist

37.7k1674106

asked Dec 29 '18 at 23:15

Mark

634

2

You open the file twice, since input_file and output_file are the same. The second time you open as read, which is where I think your problem is. So you won't be able to write.

– busybear
Dec 29 '18 at 23:23

@busybear Yes. Open your file as r+ to read and write to the file at the same time (they will both work).

– Ethan K888
Dec 29 '18 at 23:24

Possible duplicate of How might I remove duplicate lines from a file?

– glennv
Dec 29 '18 at 23:36

add a comment |

input_file = "input.txt"

output_file = "input.txt"



seen_lines = set()

outfile = open(output_file, "w")



for line in open(input_file, "r"):

    if line not in seen_lines:

        outfile.write(line)

        seen_lines.add(line)



outfile.close()

input.txt

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Keep the change ya filthy animal

Did someone say peanut butter?

Did someone say peanut butter?

Keep the change ya filthy animal

Expected output

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Did someone say peanut butter?

edited Dec 29 '18 at 23:46

Mad Physicist

37.7k1674106

asked Dec 29 '18 at 23:15

Mark

634

input_file = "input.txt"

output_file = "input.txt"



seen_lines = set()

outfile = open(output_file, "w")



for line in open(input_file, "r"):

    if line not in seen_lines:

        outfile.write(line)

        seen_lines.add(line)



outfile.close()

input.txt

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Keep the change ya filthy animal

Did someone say peanut butter?

Did someone say peanut butter?

Keep the change ya filthy animal

Expected output

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Did someone say peanut butter?

python text-files

edited Dec 29 '18 at 23:46

Mad Physicist

37.7k1674106

asked Dec 29 '18 at 23:15

Mark

634

edited Dec 29 '18 at 23:46

Mad Physicist

37.7k1674106

asked Dec 29 '18 at 23:15

Mark

634

edited Dec 29 '18 at 23:46

Mad Physicist

37.7k1674106

edited Dec 29 '18 at 23:46

Mad Physicist

37.7k1674106

edited Dec 29 '18 at 23:46

Mad Physicist

37.7k1674106

asked Dec 29 '18 at 23:15

Mark

634

asked Dec 29 '18 at 23:15

Mark

634

asked Dec 29 '18 at 23:15

Mark

634

2

You open the file twice, since input_file and output_file are the same. The second time you open as read, which is where I think your problem is. So you won't be able to write.

– busybear
Dec 29 '18 at 23:23

@busybear Yes. Open your file as r+ to read and write to the file at the same time (they will both work).

– Ethan K888
Dec 29 '18 at 23:24

Possible duplicate of How might I remove duplicate lines from a file?

– glennv
Dec 29 '18 at 23:36

add a comment |

2

You open the file twice, since input_file and output_file are the same. The second time you open as read, which is where I think your problem is. So you won't be able to write.

– busybear
Dec 29 '18 at 23:23

@busybear Yes. Open your file as r+ to read and write to the file at the same time (they will both work).

– Ethan K888
Dec 29 '18 at 23:24

Possible duplicate of How might I remove duplicate lines from a file?

– glennv
Dec 29 '18 at 23:36

You open the file twice, since input_file and output_file are the same. The second time you open as read, which is where I think your problem is. So you won't be able to write.

– busybear
Dec 29 '18 at 23:23

@busybear Yes. Open your file as r+ to read and write to the file at the same time (they will both work).

– Ethan K888
Dec 29 '18 at 23:24

Possible duplicate of How might I remove duplicate lines from a file?

– glennv
Dec 29 '18 at 23:36

add a comment |

6 Answers
6

active

oldest

votes

The line outfile = open(output_file, "w") truncates your file no matter what else you do. The reads that follow will find an empty file. My recommendation for doing this safely is to use a temporary file:

Open a temp file for writing

Process the input to the new output

Close both files

Move the temp file to the input file name

This is much more robust than opening the file twice for reading and writing. If anything goes wrong, you will have the original and whatever work you did so far stashed away. Your current approach can mess up your file if anything goes wrong in the process.

Here is a sample using tempfile.NamedTemporaryFile, and a with block to make sure everything is closed properly, even in case of error:

from tempfile import NamedTemporaryFile

from shutil import move



input_file = "input.txt"

output_file = "input.txt"



seen_lines = set()



with NamedTemporaryFile('w', delete=False) as output, open(input_file) as input:

    for line in open(input_file, "r"):

        sline = line.rstrip('n')

        if sline not in seen_lines:

            output.write(line)

            seen_lines.add(sline)

move(output.name, output_file)

The move at the end will work correctly even if the input and output names are the same, since output.name is guaranteed to be something different from both.

Note also that I'm stripping the newline from each line in the set, since the last line might not have one.

Alt Solution

If your don't care about the order of the lines, you can simplify the process somewhat by doing everything directly in memory:

input_file = "input.txt"

output_file = "input.txt"



with open(input_file) as input:

    unique = set(line.rstrip('n') for line in input)

with open(output_file, 'w') as output:

    for line in unique:

        output.write(line)

        output.write('n')

You can compare this against

with open(input_file) as input:

    unique = set(line.rstrip('n') for line in input.readlines())

with open(output_file, 'w') as output:

    output.write('n'.join(unique))

The second version does exactly the same thing, but loads and writes all at once.

edited Dec 30 '18 at 0:38

answered Dec 29 '18 at 23:30

Mad Physicist

37.7k1674106

I get an error of outfile is not defined

– Mark
Dec 29 '18 at 23:52

just a question, this way of removing duplicates is very slow if there is over 100,000 lines. Is there a better way? Also still getting the same error.

– Mark
Dec 30 '18 at 0:26

@Mark. With that size, your I/O is the bottleneck. I doubt you can do much to speed it up.

– Mad Physicist
Dec 30 '18 at 0:28

@Mark. Fixed the error. It was just a typo

– Mad Physicist
Dec 30 '18 at 0:30

@Mark. I've proposed an alternative

– Mad Physicist
Dec 30 '18 at 0:41

|
show 2 more comments

The problem is that you're trying to write to the same file that you're reading from. You have at least two options:

Option 1

Use different filenames (e.g. input.txt and output.txt). This is, at some level, easiest.

Option 2

Read all data in from your input file, close that file, then open the file for writing.

with open('input.txt', 'r') as f:

    lines = f.readlines()



seen_lines = set()

with open('input.txt', 'w') as f:

    for line in lines:

        if line not in seen_lines:

            seen_lines.add(line)

            f.write(line)

Option 3

Open the file for both reading and writing using r+ mode. You need to be careful in this case to read the data you're going to process before writing. If you do everything in a single loop, the loop iterator may lose track.

edited Dec 29 '18 at 23:28

answered Dec 29 '18 at 23:24

Jonah Bishop

9,02933357

1

Or use r+ for reading and writing.

– Ethan K888
Dec 29 '18 at 23:26

add a comment |

import os

seen_lines = 



with open('input.txt','r') as infile:

    lines=infile.readlines()

    for line in lines:

        line_stripped=line.strip()

        if line_stripped not in seen_lines:

            seen_lines.append(line_stripped)



with open('input.txt','w') as outfile:

    for line in seen_lines:

        outfile.write(line)

        if line != seen_lines[-1]:

            outfile.write(os.linesep)

Output:

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Did someone say peanut butter?

edited Dec 30 '18 at 0:41

answered Dec 29 '18 at 23:28

Bitto Bennichan

3,0311221

This fixes the problem and is a good solution for small input files, but note that it will be quite slow (quadratic time) for large files due to the linear search through seen_lines.

– Flight Odyssey
Dec 29 '18 at 23:32

When I use this code, I see Keep the change ya filthy animal twice in the output?

– Mark
Dec 29 '18 at 23:34

@Mark I tested the code and i don't see it. Can you copy the code as it is and try again? may be you made some unintentional mistake while typing it.

– Bitto Bennichan
Dec 29 '18 at 23:36

Wait, I think its because the last line has the EOF at the end of the line so it sees it as not a duplicate. I tested it. If the last line is a duplicate line, it always keeps it because of the EOF. Any way around this? I am on windows by the way

– Mark
Dec 29 '18 at 23:36

@Mark stackoverflow.com/questions/18857352/… might help. I can't say for sure. i am on Ubuntu.

– Bitto Bennichan
Dec 29 '18 at 23:43

|
show 7 more comments

I believe this is the easiest way to do what you want:

with open('FileName.txt', 'r+') as i:

    AllLines = i.readlines()

    for line in AllLines:

        #write to file

edited Dec 29 '18 at 23:45

answered Dec 29 '18 at 23:34

Matt Hawkins

At that point it would be much simpler to reopen for writing. If you're removing lines, there will be a tail left in the file.

– Mad Physicist
Dec 30 '18 at 0:14

add a comment |

Try the below code, using list comprehension with str.join and set and sorted:

input_file = "input.txt"

output_file = "input.txt"

seen_lines = 

outfile = open(output_file, "w")

infile = open(input_file, "r")

l = [i.rstrip() for i in infile.readlines()]

outfile.write('n'.join(sorted(set(l,key=l.index))))

outfile.close()

answered Dec 30 '18 at 0:16

U9-Forward

15.7k51540

add a comment |

Just my two cents, in case you happen to be able to use Python3. It uses:

A reusable Path object which has a handy write_text() method.

An OrderedDict as data structure to satisfy the constraints of uniqueness and order at once.

A generator expression instead of Path.read_text() to save on memory.

# in-place removal of duplicate lines, while remaining order

import os

from collections import OrderedDict

from pathlib import Path



filepath = Path("./duplicates.txt")



with filepath.open() as _file:

    no_duplicates = OrderedDict.fromkeys(line.rstrip('n') for line in _file)



filepath.write_text("n".join(no_duplicates))

edited Dec 30 '18 at 1:57

answered Dec 30 '18 at 1:50

timmwagener

7671814

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53974070%2fhow-to-remove-duplicate-lines%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

Open a temp file for writing

Process the input to the new output

Close both files

Move the temp file to the input file name

Here is a sample using tempfile.NamedTemporaryFile, and a with block to make sure everything is closed properly, even in case of error:

from tempfile import NamedTemporaryFile

from shutil import move



input_file = "input.txt"

output_file = "input.txt"



seen_lines = set()



with NamedTemporaryFile('w', delete=False) as output, open(input_file) as input:

    for line in open(input_file, "r"):

        sline = line.rstrip('n')

        if sline not in seen_lines:

            output.write(line)

            seen_lines.add(sline)

move(output.name, output_file)

The move at the end will work correctly even if the input and output names are the same, since output.name is guaranteed to be something different from both.

Note also that I'm stripping the newline from each line in the set, since the last line might not have one.

Alt Solution

If your don't care about the order of the lines, you can simplify the process somewhat by doing everything directly in memory:

input_file = "input.txt"

output_file = "input.txt"



with open(input_file) as input:

    unique = set(line.rstrip('n') for line in input)

with open(output_file, 'w') as output:

    for line in unique:

        output.write(line)

        output.write('n')

You can compare this against

with open(input_file) as input:

    unique = set(line.rstrip('n') for line in input.readlines())

with open(output_file, 'w') as output:

    output.write('n'.join(unique))

The second version does exactly the same thing, but loads and writes all at once.

edited Dec 30 '18 at 0:38

answered Dec 29 '18 at 23:30

Mad Physicist

37.7k1674106

I get an error of outfile is not defined

– Mark
Dec 29 '18 at 23:52

just a question, this way of removing duplicates is very slow if there is over 100,000 lines. Is there a better way? Also still getting the same error.

– Mark
Dec 30 '18 at 0:26

@Mark. With that size, your I/O is the bottleneck. I doubt you can do much to speed it up.

– Mad Physicist
Dec 30 '18 at 0:28

@Mark. Fixed the error. It was just a typo

– Mad Physicist
Dec 30 '18 at 0:30

@Mark. I've proposed an alternative

– Mad Physicist
Dec 30 '18 at 0:41

|
show 2 more comments

Open a temp file for writing

Process the input to the new output

Close both files

Move the temp file to the input file name

Here is a sample using tempfile.NamedTemporaryFile, and a with block to make sure everything is closed properly, even in case of error:

from tempfile import NamedTemporaryFile

from shutil import move



input_file = "input.txt"

output_file = "input.txt"



seen_lines = set()



with NamedTemporaryFile('w', delete=False) as output, open(input_file) as input:

    for line in open(input_file, "r"):

        sline = line.rstrip('n')

        if sline not in seen_lines:

            output.write(line)

            seen_lines.add(sline)

move(output.name, output_file)

The move at the end will work correctly even if the input and output names are the same, since output.name is guaranteed to be something different from both.

Note also that I'm stripping the newline from each line in the set, since the last line might not have one.

Alt Solution

If your don't care about the order of the lines, you can simplify the process somewhat by doing everything directly in memory:

input_file = "input.txt"

output_file = "input.txt"



with open(input_file) as input:

    unique = set(line.rstrip('n') for line in input)

with open(output_file, 'w') as output:

    for line in unique:

        output.write(line)

        output.write('n')

You can compare this against

with open(input_file) as input:

    unique = set(line.rstrip('n') for line in input.readlines())

with open(output_file, 'w') as output:

    output.write('n'.join(unique))

The second version does exactly the same thing, but loads and writes all at once.

edited Dec 30 '18 at 0:38

answered Dec 29 '18 at 23:30

Mad Physicist

37.7k1674106

I get an error of outfile is not defined

– Mark
Dec 29 '18 at 23:52

just a question, this way of removing duplicates is very slow if there is over 100,000 lines. Is there a better way? Also still getting the same error.

– Mark
Dec 30 '18 at 0:26

@Mark. With that size, your I/O is the bottleneck. I doubt you can do much to speed it up.

– Mad Physicist
Dec 30 '18 at 0:28

@Mark. Fixed the error. It was just a typo

– Mad Physicist
Dec 30 '18 at 0:30

@Mark. I've proposed an alternative

– Mad Physicist
Dec 30 '18 at 0:41

|
show 2 more comments

Open a temp file for writing

Process the input to the new output

Close both files

Move the temp file to the input file name

Here is a sample using tempfile.NamedTemporaryFile, and a with block to make sure everything is closed properly, even in case of error:

from tempfile import NamedTemporaryFile

from shutil import move



input_file = "input.txt"

output_file = "input.txt"



seen_lines = set()



with NamedTemporaryFile('w', delete=False) as output, open(input_file) as input:

    for line in open(input_file, "r"):

        sline = line.rstrip('n')

        if sline not in seen_lines:

            output.write(line)

            seen_lines.add(sline)

move(output.name, output_file)

The move at the end will work correctly even if the input and output names are the same, since output.name is guaranteed to be something different from both.

Note also that I'm stripping the newline from each line in the set, since the last line might not have one.

Alt Solution

If your don't care about the order of the lines, you can simplify the process somewhat by doing everything directly in memory:

input_file = "input.txt"

output_file = "input.txt"



with open(input_file) as input:

    unique = set(line.rstrip('n') for line in input)

with open(output_file, 'w') as output:

    for line in unique:

        output.write(line)

        output.write('n')

You can compare this against

with open(input_file) as input:

    unique = set(line.rstrip('n') for line in input.readlines())

with open(output_file, 'w') as output:

    output.write('n'.join(unique))

The second version does exactly the same thing, but loads and writes all at once.

edited Dec 30 '18 at 0:38

answered Dec 29 '18 at 23:30

Mad Physicist

37.7k1674106

Open a temp file for writing

Process the input to the new output

Close both files

Move the temp file to the input file name

Here is a sample using tempfile.NamedTemporaryFile, and a with block to make sure everything is closed properly, even in case of error:

from tempfile import NamedTemporaryFile

from shutil import move



input_file = "input.txt"

output_file = "input.txt"



seen_lines = set()



with NamedTemporaryFile('w', delete=False) as output, open(input_file) as input:

    for line in open(input_file, "r"):

        sline = line.rstrip('n')

        if sline not in seen_lines:

            output.write(line)

            seen_lines.add(sline)

move(output.name, output_file)

The move at the end will work correctly even if the input and output names are the same, since output.name is guaranteed to be something different from both.

Note also that I'm stripping the newline from each line in the set, since the last line might not have one.

Alt Solution

If your don't care about the order of the lines, you can simplify the process somewhat by doing everything directly in memory:

input_file = "input.txt"

output_file = "input.txt"



with open(input_file) as input:

    unique = set(line.rstrip('n') for line in input)

with open(output_file, 'w') as output:

    for line in unique:

        output.write(line)

        output.write('n')

You can compare this against

with open(input_file) as input:

    unique = set(line.rstrip('n') for line in input.readlines())

with open(output_file, 'w') as output:

    output.write('n'.join(unique))

The second version does exactly the same thing, but loads and writes all at once.

edited Dec 30 '18 at 0:38

answered Dec 29 '18 at 23:30

Mad Physicist

37.7k1674106

edited Dec 30 '18 at 0:38

answered Dec 29 '18 at 23:30

Mad Physicist

37.7k1674106

answered Dec 29 '18 at 23:30

Mad Physicist

37.7k1674106

answered Dec 29 '18 at 23:30

Mad Physicist

37.7k1674106

I get an error of outfile is not defined

– Mark
Dec 29 '18 at 23:52

just a question, this way of removing duplicates is very slow if there is over 100,000 lines. Is there a better way? Also still getting the same error.

– Mark
Dec 30 '18 at 0:26

@Mark. With that size, your I/O is the bottleneck. I doubt you can do much to speed it up.

– Mad Physicist
Dec 30 '18 at 0:28

@Mark. Fixed the error. It was just a typo

– Mad Physicist
Dec 30 '18 at 0:30

@Mark. I've proposed an alternative

– Mad Physicist
Dec 30 '18 at 0:41

|
show 2 more comments

I get an error of outfile is not defined

– Mark
Dec 29 '18 at 23:52

just a question, this way of removing duplicates is very slow if there is over 100,000 lines. Is there a better way? Also still getting the same error.

– Mark
Dec 30 '18 at 0:26

@Mark. With that size, your I/O is the bottleneck. I doubt you can do much to speed it up.

– Mad Physicist
Dec 30 '18 at 0:28

@Mark. Fixed the error. It was just a typo

– Mad Physicist
Dec 30 '18 at 0:30

@Mark. I've proposed an alternative

– Mad Physicist
Dec 30 '18 at 0:41

I get an error of outfile is not defined

– Mark
Dec 29 '18 at 23:52

just a question, this way of removing duplicates is very slow if there is over 100,000 lines. Is there a better way? Also still getting the same error.

– Mark
Dec 30 '18 at 0:26

@Mark. With that size, your I/O is the bottleneck. I doubt you can do much to speed it up.

– Mad Physicist
Dec 30 '18 at 0:28

@Mark. Fixed the error. It was just a typo

– Mad Physicist
Dec 30 '18 at 0:30

@Mark. I've proposed an alternative

– Mad Physicist
Dec 30 '18 at 0:41

|
show 2 more comments

The problem is that you're trying to write to the same file that you're reading from. You have at least two options:

Option 1

Use different filenames (e.g. input.txt and output.txt). This is, at some level, easiest.

Option 2

Read all data in from your input file, close that file, then open the file for writing.

with open('input.txt', 'r') as f:

    lines = f.readlines()



seen_lines = set()

with open('input.txt', 'w') as f:

    for line in lines:

        if line not in seen_lines:

            seen_lines.add(line)

            f.write(line)

Option 3

edited Dec 29 '18 at 23:28

answered Dec 29 '18 at 23:24

Jonah Bishop

9,02933357

1

Or use r+ for reading and writing.

– Ethan K888
Dec 29 '18 at 23:26

add a comment |

The problem is that you're trying to write to the same file that you're reading from. You have at least two options:

Option 1

Use different filenames (e.g. input.txt and output.txt). This is, at some level, easiest.

Option 2

Read all data in from your input file, close that file, then open the file for writing.

with open('input.txt', 'r') as f:

    lines = f.readlines()



seen_lines = set()

with open('input.txt', 'w') as f:

    for line in lines:

        if line not in seen_lines:

            seen_lines.add(line)

            f.write(line)

Option 3

edited Dec 29 '18 at 23:28

answered Dec 29 '18 at 23:24

Jonah Bishop

9,02933357

1

Or use r+ for reading and writing.

– Ethan K888
Dec 29 '18 at 23:26

add a comment |

The problem is that you're trying to write to the same file that you're reading from. You have at least two options:

Option 1

Use different filenames (e.g. input.txt and output.txt). This is, at some level, easiest.

Option 2

Read all data in from your input file, close that file, then open the file for writing.

with open('input.txt', 'r') as f:

    lines = f.readlines()



seen_lines = set()

with open('input.txt', 'w') as f:

    for line in lines:

        if line not in seen_lines:

            seen_lines.add(line)

            f.write(line)

Option 3

edited Dec 29 '18 at 23:28

answered Dec 29 '18 at 23:24

Jonah Bishop

9,02933357

The problem is that you're trying to write to the same file that you're reading from. You have at least two options:

Option 1

Use different filenames (e.g. input.txt and output.txt). This is, at some level, easiest.

Option 2

Read all data in from your input file, close that file, then open the file for writing.

with open('input.txt', 'r') as f:

    lines = f.readlines()



seen_lines = set()

with open('input.txt', 'w') as f:

    for line in lines:

        if line not in seen_lines:

            seen_lines.add(line)

            f.write(line)

Option 3

edited Dec 29 '18 at 23:28

answered Dec 29 '18 at 23:24

Jonah Bishop

9,02933357

edited Dec 29 '18 at 23:28

answered Dec 29 '18 at 23:24

Jonah Bishop

9,02933357

answered Dec 29 '18 at 23:24

Jonah Bishop

9,02933357

answered Dec 29 '18 at 23:24

Jonah Bishop

9,02933357

1

Or use r+ for reading and writing.

– Ethan K888
Dec 29 '18 at 23:26

add a comment |

1

Or use r+ for reading and writing.

– Ethan K888
Dec 29 '18 at 23:26

Or use r+ for reading and writing.

– Ethan K888
Dec 29 '18 at 23:26

add a comment |

import os

seen_lines = 



with open('input.txt','r') as infile:

    lines=infile.readlines()

    for line in lines:

        line_stripped=line.strip()

        if line_stripped not in seen_lines:

            seen_lines.append(line_stripped)



with open('input.txt','w') as outfile:

    for line in seen_lines:

        outfile.write(line)

        if line != seen_lines[-1]:

            outfile.write(os.linesep)

Output:

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Did someone say peanut butter?

edited Dec 30 '18 at 0:41

answered Dec 29 '18 at 23:28

Bitto Bennichan

3,0311221

This fixes the problem and is a good solution for small input files, but note that it will be quite slow (quadratic time) for large files due to the linear search through seen_lines.

– Flight Odyssey
Dec 29 '18 at 23:32

When I use this code, I see Keep the change ya filthy animal twice in the output?

– Mark
Dec 29 '18 at 23:34

@Mark I tested the code and i don't see it. Can you copy the code as it is and try again? may be you made some unintentional mistake while typing it.

– Bitto Bennichan
Dec 29 '18 at 23:36

Wait, I think its because the last line has the EOF at the end of the line so it sees it as not a duplicate. I tested it. If the last line is a duplicate line, it always keeps it because of the EOF. Any way around this? I am on windows by the way

– Mark
Dec 29 '18 at 23:36

@Mark stackoverflow.com/questions/18857352/… might help. I can't say for sure. i am on Ubuntu.

– Bitto Bennichan
Dec 29 '18 at 23:43

|
show 7 more comments

import os

seen_lines = 



with open('input.txt','r') as infile:

    lines=infile.readlines()

    for line in lines:

        line_stripped=line.strip()

        if line_stripped not in seen_lines:

            seen_lines.append(line_stripped)



with open('input.txt','w') as outfile:

    for line in seen_lines:

        outfile.write(line)

        if line != seen_lines[-1]:

            outfile.write(os.linesep)

Output:

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Did someone say peanut butter?

edited Dec 30 '18 at 0:41

answered Dec 29 '18 at 23:28

Bitto Bennichan

3,0311221

This fixes the problem and is a good solution for small input files, but note that it will be quite slow (quadratic time) for large files due to the linear search through seen_lines.

– Flight Odyssey
Dec 29 '18 at 23:32

When I use this code, I see Keep the change ya filthy animal twice in the output?

– Mark
Dec 29 '18 at 23:34

@Mark I tested the code and i don't see it. Can you copy the code as it is and try again? may be you made some unintentional mistake while typing it.

– Bitto Bennichan
Dec 29 '18 at 23:36

Wait, I think its because the last line has the EOF at the end of the line so it sees it as not a duplicate. I tested it. If the last line is a duplicate line, it always keeps it because of the EOF. Any way around this? I am on windows by the way

– Mark
Dec 29 '18 at 23:36

@Mark stackoverflow.com/questions/18857352/… might help. I can't say for sure. i am on Ubuntu.

– Bitto Bennichan
Dec 29 '18 at 23:43

|
show 7 more comments

import os

seen_lines = 



with open('input.txt','r') as infile:

    lines=infile.readlines()

    for line in lines:

        line_stripped=line.strip()

        if line_stripped not in seen_lines:

            seen_lines.append(line_stripped)



with open('input.txt','w') as outfile:

    for line in seen_lines:

        outfile.write(line)

        if line != seen_lines[-1]:

            outfile.write(os.linesep)

Output:

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Did someone say peanut butter?

edited Dec 30 '18 at 0:41

answered Dec 29 '18 at 23:28

Bitto Bennichan

3,0311221

import os

seen_lines = 



with open('input.txt','r') as infile:

    lines=infile.readlines()

    for line in lines:

        line_stripped=line.strip()

        if line_stripped not in seen_lines:

            seen_lines.append(line_stripped)



with open('input.txt','w') as outfile:

    for line in seen_lines:

        outfile.write(line)

        if line != seen_lines[-1]:

            outfile.write(os.linesep)

Output:

I really love christmas

Keep the change ya filthy animal

Pizza is my fav food

Did someone say peanut butter?

edited Dec 30 '18 at 0:41

answered Dec 29 '18 at 23:28

Bitto Bennichan

3,0311221

edited Dec 30 '18 at 0:41

answered Dec 29 '18 at 23:28

Bitto Bennichan

3,0311221

answered Dec 29 '18 at 23:28

Bitto Bennichan

3,0311221

answered Dec 29 '18 at 23:28

Bitto Bennichan

3,0311221

This fixes the problem and is a good solution for small input files, but note that it will be quite slow (quadratic time) for large files due to the linear search through seen_lines.

– Flight Odyssey
Dec 29 '18 at 23:32

When I use this code, I see Keep the change ya filthy animal twice in the output?

– Mark
Dec 29 '18 at 23:34

@Mark I tested the code and i don't see it. Can you copy the code as it is and try again? may be you made some unintentional mistake while typing it.

– Bitto Bennichan
Dec 29 '18 at 23:36

Wait, I think its because the last line has the EOF at the end of the line so it sees it as not a duplicate. I tested it. If the last line is a duplicate line, it always keeps it because of the EOF. Any way around this? I am on windows by the way

– Mark
Dec 29 '18 at 23:36

@Mark stackoverflow.com/questions/18857352/… might help. I can't say for sure. i am on Ubuntu.

– Bitto Bennichan
Dec 29 '18 at 23:43

|
show 7 more comments

This fixes the problem and is a good solution for small input files, but note that it will be quite slow (quadratic time) for large files due to the linear search through seen_lines.

– Flight Odyssey
Dec 29 '18 at 23:32

When I use this code, I see Keep the change ya filthy animal twice in the output?

– Mark
Dec 29 '18 at 23:34

@Mark I tested the code and i don't see it. Can you copy the code as it is and try again? may be you made some unintentional mistake while typing it.

– Bitto Bennichan
Dec 29 '18 at 23:36

Wait, I think its because the last line has the EOF at the end of the line so it sees it as not a duplicate. I tested it. If the last line is a duplicate line, it always keeps it because of the EOF. Any way around this? I am on windows by the way

– Mark
Dec 29 '18 at 23:36

@Mark stackoverflow.com/questions/18857352/… might help. I can't say for sure. i am on Ubuntu.

– Bitto Bennichan
Dec 29 '18 at 23:43

This fixes the problem and is a good solution for small input files, but note that it will be quite slow (quadratic time) for large files due to the linear search through seen_lines.

– Flight Odyssey
Dec 29 '18 at 23:32

When I use this code, I see Keep the change ya filthy animal twice in the output?

– Mark
Dec 29 '18 at 23:34

@Mark I tested the code and i don't see it. Can you copy the code as it is and try again? may be you made some unintentional mistake while typing it.

– Bitto Bennichan
Dec 29 '18 at 23:36

Wait, I think its because the last line has the EOF at the end of the line so it sees it as not a duplicate. I tested it. If the last line is a duplicate line, it always keeps it because of the EOF. Any way around this? I am on windows by the way

– Mark
Dec 29 '18 at 23:36

@Mark stackoverflow.com/questions/18857352/… might help. I can't say for sure. i am on Ubuntu.

– Bitto Bennichan
Dec 29 '18 at 23:43

|
show 7 more comments

I believe this is the easiest way to do what you want:

with open('FileName.txt', 'r+') as i:

    AllLines = i.readlines()

    for line in AllLines:

        #write to file

edited Dec 29 '18 at 23:45

answered Dec 29 '18 at 23:34

Matt Hawkins

At that point it would be much simpler to reopen for writing. If you're removing lines, there will be a tail left in the file.

– Mad Physicist
Dec 30 '18 at 0:14

add a comment |

I believe this is the easiest way to do what you want:

with open('FileName.txt', 'r+') as i:

    AllLines = i.readlines()

    for line in AllLines:

        #write to file

edited Dec 29 '18 at 23:45

answered Dec 29 '18 at 23:34

Matt Hawkins

At that point it would be much simpler to reopen for writing. If you're removing lines, there will be a tail left in the file.

– Mad Physicist
Dec 30 '18 at 0:14

add a comment |

I believe this is the easiest way to do what you want:

with open('FileName.txt', 'r+') as i:

    AllLines = i.readlines()

    for line in AllLines:

        #write to file

edited Dec 29 '18 at 23:45

answered Dec 29 '18 at 23:34

Matt Hawkins

I believe this is the easiest way to do what you want:

with open('FileName.txt', 'r+') as i:

    AllLines = i.readlines()

    for line in AllLines:

        #write to file

edited Dec 29 '18 at 23:45

answered Dec 29 '18 at 23:34

Matt Hawkins

edited Dec 29 '18 at 23:45

answered Dec 29 '18 at 23:34

Matt Hawkins

answered Dec 29 '18 at 23:34

Matt Hawkins

answered Dec 29 '18 at 23:34

Matt Hawkins

At that point it would be much simpler to reopen for writing. If you're removing lines, there will be a tail left in the file.

– Mad Physicist
Dec 30 '18 at 0:14

add a comment |

At that point it would be much simpler to reopen for writing. If you're removing lines, there will be a tail left in the file.

– Mad Physicist
Dec 30 '18 at 0:14

At that point it would be much simpler to reopen for writing. If you're removing lines, there will be a tail left in the file.

– Mad Physicist
Dec 30 '18 at 0:14

add a comment |

Try the below code, using list comprehension with str.join and set and sorted:

input_file = "input.txt"

output_file = "input.txt"

seen_lines = 

outfile = open(output_file, "w")

infile = open(input_file, "r")

l = [i.rstrip() for i in infile.readlines()]

outfile.write('n'.join(sorted(set(l,key=l.index))))

outfile.close()

answered Dec 30 '18 at 0:16

U9-Forward

15.7k51540

add a comment |

Try the below code, using list comprehension with str.join and set and sorted:

input_file = "input.txt"

output_file = "input.txt"

seen_lines = 

outfile = open(output_file, "w")

infile = open(input_file, "r")

l = [i.rstrip() for i in infile.readlines()]

outfile.write('n'.join(sorted(set(l,key=l.index))))

outfile.close()

answered Dec 30 '18 at 0:16

U9-Forward

15.7k51540

add a comment |

Try the below code, using list comprehension with str.join and set and sorted:

input_file = "input.txt"

output_file = "input.txt"

seen_lines = 

outfile = open(output_file, "w")

infile = open(input_file, "r")

l = [i.rstrip() for i in infile.readlines()]

outfile.write('n'.join(sorted(set(l,key=l.index))))

outfile.close()

answered Dec 30 '18 at 0:16

U9-Forward

15.7k51540

Try the below code, using list comprehension with str.join and set and sorted:

input_file = "input.txt"

output_file = "input.txt"

seen_lines = 

outfile = open(output_file, "w")

infile = open(input_file, "r")

l = [i.rstrip() for i in infile.readlines()]

outfile.write('n'.join(sorted(set(l,key=l.index))))

outfile.close()

answered Dec 30 '18 at 0:16

U9-Forward

15.7k51540

answered Dec 30 '18 at 0:16

U9-Forward

15.7k51540

answered Dec 30 '18 at 0:16

U9-Forward

15.7k51540

answered Dec 30 '18 at 0:16

U9-Forward

15.7k51540

add a comment |

Just my two cents, in case you happen to be able to use Python3. It uses:

A reusable Path object which has a handy write_text() method.

An OrderedDict as data structure to satisfy the constraints of uniqueness and order at once.

A generator expression instead of Path.read_text() to save on memory.

# in-place removal of duplicate lines, while remaining order

import os

from collections import OrderedDict

from pathlib import Path



filepath = Path("./duplicates.txt")



with filepath.open() as _file:

    no_duplicates = OrderedDict.fromkeys(line.rstrip('n') for line in _file)



filepath.write_text("n".join(no_duplicates))

edited Dec 30 '18 at 1:57

answered Dec 30 '18 at 1:50

timmwagener

7671814

add a comment |

Just my two cents, in case you happen to be able to use Python3. It uses:

A reusable Path object which has a handy write_text() method.

An OrderedDict as data structure to satisfy the constraints of uniqueness and order at once.

A generator expression instead of Path.read_text() to save on memory.

# in-place removal of duplicate lines, while remaining order

import os

from collections import OrderedDict

from pathlib import Path



filepath = Path("./duplicates.txt")



with filepath.open() as _file:

    no_duplicates = OrderedDict.fromkeys(line.rstrip('n') for line in _file)



filepath.write_text("n".join(no_duplicates))

edited Dec 30 '18 at 1:57

answered Dec 30 '18 at 1:50

timmwagener

7671814

add a comment |

Just my two cents, in case you happen to be able to use Python3. It uses:

A reusable Path object which has a handy write_text() method.

An OrderedDict as data structure to satisfy the constraints of uniqueness and order at once.

A generator expression instead of Path.read_text() to save on memory.

# in-place removal of duplicate lines, while remaining order

import os

from collections import OrderedDict

from pathlib import Path



filepath = Path("./duplicates.txt")



with filepath.open() as _file:

    no_duplicates = OrderedDict.fromkeys(line.rstrip('n') for line in _file)



filepath.write_text("n".join(no_duplicates))

edited Dec 30 '18 at 1:57

answered Dec 30 '18 at 1:50

timmwagener

7671814

Just my two cents, in case you happen to be able to use Python3. It uses:

A reusable Path object which has a handy write_text() method.

An OrderedDict as data structure to satisfy the constraints of uniqueness and order at once.

A generator expression instead of Path.read_text() to save on memory.

# in-place removal of duplicate lines, while remaining order

import os

from collections import OrderedDict

from pathlib import Path



filepath = Path("./duplicates.txt")



with filepath.open() as _file:

    no_duplicates = OrderedDict.fromkeys(line.rstrip('n') for line in _file)



filepath.write_text("n".join(no_duplicates))

edited Dec 30 '18 at 1:57

answered Dec 30 '18 at 1:50

timmwagener

7671814

edited Dec 30 '18 at 1:57

answered Dec 30 '18 at 1:50

timmwagener

7671814

answered Dec 30 '18 at 1:50

timmwagener

7671814

answered Dec 30 '18 at 1:50

timmwagener

7671814

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xrfgtjtk