No, I'm not paranoid enough yet. It's not sufficient only to say
that the HTML is encoded as UTF-8 (see below).

David-Sarah Hopwood wrote:
> The HTML or XHTML document starts with a correct <!DOCTYPE or
> <?xml declaration respectively,

I meant, the document starts with <!DOCTYPE HTML> in the case
of HTML, or <?xml version="1.0"?><!DOCTYPE HTML> in the case of

(This will also put the parser into sane^H^H^H^Hstandards mode.)

> and is encoded as well-formed UTF-8.

The document must also start with a UTF-8 BOM, *and* must not
contain a META directive that changes the charset, *and* in the
case of HTML, must either be retrieved from a local file or over
HTTP with the header "Content-Type: text/html; charset=UTF-8".
This is because the method of determining the encoding is chosen
based on the phase of the moon.

Any other problems?

David-Sarah Hopwood ⚥

