EthiCS: UTF-8

In this lecture, we discuss the ways harms and technical systems can interact, using language coding systems as an example.

This material is by Eddie Kohler, Eliza Wells, and William Cochran.

Background reading

Harm

Kinds of harm

Reasonable accommodation and undue burden

Ethical questions for a design decision

  1. Who will likely benefit from this decision?
  2. Who could it harm?
  3. In what ways, specifically, could it benefit or harm them?
  4. What will it take to avoid or mitigate such harm(s)?
  5. Does the work to avoid the harm constitute a reasonable accommodation or an undue burden? For instance,
    • Who has the resources to shoulder this burden?
    • Who has the responsibility to shoulder it?

An example at Harvard

Accessible videos

Data representation: Human language

History of character encoding

Encoding requirements

Encoding history

Harms and one-byte encodings

Enter Unicode

Harms and Unicode

Problems with 16- and 32-bit encodings for Unicode

Desiderata for an 8-bit encoding for Unicode

Idea #1: Unary

Harm, undue burden, reasonable accommodation

Idea #2: Byte pairs

Idea #3: Byte triples

Idea #4: Self-synchronizing byte quadruplets

UTF-8

Advantages of UTF-8

UTF-8 deployment

Use of UTF-8 on the Web to 2012 Use of UTF-8 on the Web since 2011

Ref, ref

Harms in UTF-8

Case studies

Emoji

Emoji history

Emoji and Unicode

Will a fixed emoji set suffice?

Emoji and culture

Reducing undesirable signification

Representing more cultures

Example: Kiss

Presented without comment

From The Unicode Emoji technical standard:

Gas pump