Language Design: Unions
TL;DR: Tagged unions whose variants do not require syntactic wrappers.
Introduction
A “traditional” enum (ADT) definition as it exists in various languages defines both the enum itself
(Pet
), as well as its variants (Cat
and Dog
):
enum Pet {
Cat(name: String, lives: Int),
Dog(name: String, age: Int)
}
let pet: Pet = Cat("Molly", 9)
Some languages like Rust, C or C++ provide untagged unions, where the chosen variant has to be specified on creation and access:
union Pet {
cat: Cat,
dog: Dog
}
let pet = Pet { cat: Cat("Molly", 9) }
Other languages provide untagged union types where the union type itself (Pet
) is defined,
and its variants (Cat
and Dog
) refer to existing types in scope that may or may not allow detecting the chosen variant1:
type Pet = Cat | Dog
let pet: Pet = Cat("Molly", 9)
Observation
- ADTs are generally tagged unions (their variants can be told apart, even if they contain the same values)
and come with wrappers (
Cat
,Dog
) around their payloads. - Untagged unions do not contain metadata (runtime tags) to distinguish variants, but require that every access is qualified with variant information.
- Union types do not contain metadata (runtime tags) to distinguish variants and do not use syntactic wrappers.
Syntactic Wrapping | No Syntactic Wrapping | |
---|---|---|
Runtime Tagging | tagged union/ADT/enum | ? |
No Runtime Tagging | untagged union (Rust, C, C++) | union type |
Filling in the upper right quadrant
Let’s think about the combination of tagged union without syntactic wrapping in the upper right quadrant:
class Cat(name: String, lives: Int)
class Dog(name: String, age: Int)
union Pet of Cat, Dog
let pet: Pet = Cat("Molly", 9)
This defines the union Pet
, refers to existing types Cat
and Dog
,
and assigns an instance of Cat
to a binding pet
of type Pet
without syntactic wrapping.
Intuitively, this works similarly to permits
clauses of sealed interfaces in Java in the sense that
sealed interface Pet permits Cat, Dog { ... }
does not define Cat
or Dog
, but refers to existing Cat
and Dog
types in scope.2
Benefits of such unions
- Union variants have types, because they have a “real” class/struct/… declaration.
(This fixes a mistake that some languages like Rust or Haskell made with their enums/ADTs.34) - Variants can be reference types or value types (as they refer to “real”
class
orvalue
definitions). - No “stutter”, where variant names have to be invented to wrap existing types. (Rust has this issue.)
- Union values can be passed/created more easily, as no syntactic wrapping is required.
- Variants can be re-used in different unions.
- The ability to build ad-hoc unions out of existing types obviates the need for a separate type alias feature.
Example for 1., 2., 3.
enum Option[T] { Some(value: T), None }
… would receive little benefit from being written as …
union Option[T] of Some[T], None
value Some[T](value: T)
module None
…, but even trivial ADTs like a JSON representation would benefit.
Instead of …
enum JsonValue {
JsonObject(Map[String, JsonValue])
JsonArray (Array[JsonValue]),
JsonString(String),
JsonNumber(Float64),
JsonBool (Bool),
JsonNull,
...
}
… one would write (with Array
, Float64
and String
being existing types in the language):
union JsonValue of
Map[String, JsonValue]
Array[JsonValue],
String,
Float64
Bool,
JsonNull,
...
module JsonNull
Example for 4.
No wrapping required when passing arguments (unlike “traditional” enum approaches):
fun someValue(value: JsonValue) = ...
someValue(JsonString("test")) // "traditional" approach
someValue("test") // with non-definitional unions
Example for 5.
Consider this class definition:
class Name(name: String)
With non-definitional unions, Name
can be used multiple times – in different unions (and elsewhere):
union PersonIdentifier of
Name,
... // other identifiers like TaxId, Description, PhoneNumber etc.
union DogTag of
Name,
... // other identifiers like RegId, ...
Non-definitional unions reduce indirection at use-sites and can be used in more scenarios (compared to more “traditional” enums), while not changing their runtime costs or representation.