Skip to content

Ensure input files are read as UTF-8#45

Open
refack wants to merge 2 commits into
PeterFeicht:masterfrom
refack:fix-44
Open

Ensure input files are read as UTF-8#45
refack wants to merge 2 commits into
PeterFeicht:masterfrom
refack:fix-44

Conversation

@refack

@refack refack commented Apr 25, 2025

Copy link
Copy Markdown

Fixes #44

@refack

refack commented Apr 25, 2025

Copy link
Copy Markdown
Author

Left is content of the released tarball
Right is output after this patch
image
image

@PeterFeicht

Copy link
Copy Markdown
Owner

Sorry for not responding earlier. I meant to check if this is needed elsewhere too, because I think there's more than one place that creates a parser.

@PeterFeicht

Copy link
Copy Markdown
Owner

Please also change the occurrences in preprocess_cssless.py, index2ddg.py, test/test_preprocess.py, and test/test_preprocess_cssless.py.

It would also be great if you could add a test to test/test_preprocess.py, but that's not a blocker for me.

@refack

refack commented Sep 23, 2025

Copy link
Copy Markdown
Author

Please also change the occurrences in...

👍

It would also be great if you could add a test to

I'm out of the flow, so I'm not sure how to write a test for this...

@refack

refack commented Sep 23, 2025

Copy link
Copy Markdown
Author

I added a second commit that also makes opening the files explicitly as UTF-8

with open(src_path, 'r', encoding='utf-8') as a_file:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect handling of UTF-8 encoding during preprocessing

2 participants