The only last comment is on handling the .encode bit. That's still totally broken. There are well-known methods for handling bytes and unicode in python2. The general rule is to *always* encode at boundaries (on the outside world is bytes, on the inside text is unicode, bytes are str).
You can try this:
- assume all the api query functions return unicode strings
- add __from__ future import unicode_literals
- you will get everything as unicode internally
- each file IO needs to have the mode specified as either text with encoding= keyword argument or as 'b' and never treat that as text.
This will give you a clear upgrade path to python3 in any situation. It also means that you never have to encode anything.
The only last comment is on handling the .encode bit. That's still totally broken. There are well-known methods for handling bytes and unicode in python2. The general rule is to *always* encode at boundaries (on the outside world is bytes, on the inside text is unicode, bytes are str).
You can try this:
- assume all the api query functions return unicode strings
- add __from__ future import unicode_literals
- you will get everything as unicode internally
- each file IO needs to have the mode specified as either text with encoding= keyword argument or as 'b' and never treat that as text.
This will give you a clear upgrade path to python3 in any situation. It also means that you never have to encode anything.