I am fetching data from a Windows FTP server, which contains some special characters.
Traceback (most recent call last):
File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/errors.py", line 125, in new_func
return func(*args, **kwargs)
File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/opener/ftpfs.py", line 56, in open_fs
return ftp_fs.opendir(dir_path, factory=ClosingSubFS)
File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/base.py", line 1247, in opendir
if not self.getinfo(path).is_dir:
File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/ftpfs.py", line 682, in getinfo
directory = self._read_dir(dir_name)
File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/ftpfs.py", line 559, in _read_dir
self.ftp.retrlines(
File "/usr/lib64/python3.8/ftplib.py", line 461, in retrlines
line = fp.readline(self.maxline + 1)
File "/usr/lib64/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 49: invalid continuation byte
ipdb session:
ipdb> data
b'10-01-2021 11:00PM <DIR> Bilder V\xe4stra G\xf6taland\r\n10-06-2021 10:03AM <DIR> SeNorge\r\n'
ipdb> data.decode('utf8')
*** UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 49: invalid continuation byte
ipdb> data.decode('windows-1252')
'10-01-2021 11:00PM <DIR> Bilder Västra Götaland\r\n10-06-2021 10:03AM <DIR> SeNorge\r\n'
Python built-in ftplib can use a different encoding: https://docs.python.org/3/library/ftplib.html#ftplib.FTP
class ftplib.FTP(host='', user='', passwd='', acct='', timeout=None, source_address=None, *, encoding='utf-8')¶
ftpfs does not take "encoding" as parameter:
|
ftp_fs = FTPFS( |
|
ftp_host, |
|
port=ftp_port, |
|
user=parse_result.username, |
|
passwd=parse_result.password, |
|
proxy=parse_result.params.get("proxy"), |
|
timeout=int(parse_result.params.get("timeout", "10")), |
|
tls=bool(parse_result.protocol == "ftps"), |
|
) |
|
def __init__( |
|
self, |
|
host, # type: Text |
|
user="anonymous", # type: Text |
|
passwd="", # type: Text |
|
acct="", # type: Text |
|
timeout=10, # type: int |
|
port=21, # type: int |
|
proxy=None, # type: Optional[Text] |
|
tls=False, # type: bool |
|
): |
I propose to accept encoding as an optional parameter, which should then passed to the FTP constructor.
It would then be possible to connect to resources like: ftp://user:password@ftpserver/path?encoding=windows-1252
I am fetching data from a Windows FTP server, which contains some special characters.
ipdb session:
Python built-in ftplib can use a different encoding: https://docs.python.org/3/library/ftplib.html#ftplib.FTP
ftpfs does not take "encoding" as parameter:
pyfilesystem2/fs/opener/ftpfs.py
Lines 44 to 52 in baa0560
pyfilesystem2/fs/ftpfs.py
Lines 399 to 409 in baa0560
I propose to accept encoding as an optional parameter, which should then passed to the FTP constructor.
It would then be possible to connect to resources like:
ftp://user:password@ftpserver/path?encoding=windows-1252