This is not the current version of the class.

EthiCS: UTF-8

In this lecture, we discuss the ways harms and technical systems can interact, using language coding systems as an example.

This material is by Eddie Kohler, Eliza Wells, and William Cochran.

Background reading

Harm

Kinds of harm

Reasonable accommodation and undue burden

Ethical questions for a design decision

  1. Who will likely benefit from this decision?
  2. Who could it harm?
  3. In what ways, specifically, could it benefit or harm them?
  4. What will it take to avoid or mitigate such harm(s)?
  5. Does the work to avoid the harm constitute a reasonable accommodation or an undue burden? For instance,
    • Who has the resources to shoulder this burden?
    • Who has the responsibility to shoulder it?

History of character encoding

Encoding requirements

Encoding history

Harms and one-byte encodings

Enter Unicode

Harms and Unicode

Problems with 16- and 32-bit encodings for Unicode

Desiderata

Straw-man solution

TextCode pointsEncoding
Hi!U+0065 U+0069 U+00210x65 0x69 0x21
«Allô !»U+00AB U+0065 U+006C U+006C U+00F4 U+0020 U+0021 U+00BB0xAB 0x65 0x6C 0x6C 0xF4 0x20 0x21 0xBB
你好U+4F60 U+597D0xFE 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF … 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFE 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF … (43,044 bytes omitted)

Harm, undue burden, reasonable accommodation

Second straw man (surrogate pairs)

Third straw man

UTF-8

Advantages of UTF-8

UTF-8 deployment

Use of UTF-8 on the Web to 2012 Use of UTF-8 on the Web since 2011

Ref, ref

Harms in UTF-8

Case studies

Emoji

Emoji history

Emoji and Unicode

Will a fixed emoji set suffice?

Emoji and culture

Reducing undesirable signification

Representing more cultures

Example: Kiss

Presented without comment

From The Unicode Emoji technical standard:

Gas pump