UTF-8, collapse the two together. Simplifies the code at the expense of more memory (which can probably be reduced again later).