Sunday, April 9, 2017

...and the glorious subst to come

If you’re interested in design with zero-cost type tagging, or some cases of AnyVal I didn’t cover in the first article, or you’re looking for something else I missed, check here. There’s a lot more I didn’t have room for in the first article. Consider this “bonus content”.

Unidirectional subst

We saw earlier that though subst appears to substitute in only one direction, that direction can easily be reversed. This is due to the symmetry of type equality—if A = B, then surely also B = A.

Suppose that apply implemented some per-String validation logic. In that case, you wouldn’t want users of the Label API to be able to circumvent this validation, wholesale; this is easy to do with the subst I have shown, and we saw it already when we tagged a whole list and function, both designed only for plain Strings!

We can get an idea of how to fix this by comparing Leibniz and Liskov. Looking at the signature of Liskov.subst, you decide to introduce widen, replacing subst.

// in LabelImpl
  def widen[F[+_]](ft: F[T]): F[String]

// in val Label
  override def widen[F[+_]](ft: F[T]) = ft

With this design, you can untag a tagged list.

scala> Label.widen(taggedList)
res0: List[String] = List(hello, world)

You can tag a function that takes an untagged list as parameter.

scala> def report(xs: List[String]): Unit = ()
report: (xs: List[String])Unit

scala> def cwiden[F[-_]](fs: F[String]): F[Label] =
         Label.widen[Lambda[`+x` => F[x] => F[Label]]](identity)(fs)
cwiden: [F[-_]](fs: F[String])F[Label]

scala> cwiden[Lambda[`-x` => List[x] => Unit]](report)
res1: List[Label] => Unit = $$Lambda$3263/1163097357@7e4f65b7

However, logically, this kind of “tagging” is just a delayed “untagging” of the Ts involved, so your validation rules are preserved.

What’s happening? With subst, we selectively revealed a type equality. widen is deliberately less revealing; it selectively reveals a subtyping relationship, namely, T <: String.

scala> import scalaz.Liskov, Liskov.<~<

scala> Label.widen[Lambda[`+x` => (Label <~< x)]](Liskov.refl)
res2: scalaz.Liskov[Label.T,String] = scalaz.Liskov$$anon$3@58e8db18

Cheap tagging with validation

You can think of + or - in the signatures of widen and cwiden above as a kind of constraint on the F that those functions take; by contrast, subst took any F without bounds on its argument.

There are other interesting choices of constraint, like Foldable.

import scalaz.{Failure, Foldable, Success, ValidationNel}
import scalaz.syntax.std.option._
import scalaz.syntax.foldable._

// in LabelImpl, alongside def widen:
  def narrow[F[_]: Foldable](fs: F[String])
    : ValidationNel[Err, F[T]]

// in val Label
  override def narrow[F[_]: Foldable](fs: F[String]) =
    fs.foldMap{string =>
      // return errors if not OK, INil() if OK
    }.toNel cata (Failure(_), Success(fs))

This is interesting because if you pass anything and get back a Success, the succeeding value is just the argument you passed in, no reallocation necessary. (To reallocate, we would need Traverse instead of Foldable.)

Unidirectional without subtyping

If you prefer to avoid subtyping, you can also constrain subst variants with typeclasses indicating directionality. For Scalaz or Cats, providing both of these would be a sufficient substitute for the widen[F[+_]] introduced above.

  def widen[F[_]: Functor](ft: F[T]): F[String]
  def cwiden[F[_]: Contravariant](fs: F[String]): F[T]

T = String translucency

subst and widen are very powerful, but maybe you’re bothered by the fact that T erases to Object, and you would rather “untagging” happen automatically.

Thus far, you’ve been selectively revealing aspects of the type relationship between T and String. What if you were to globally reveal part of it?

To be clear, we must not globally reveal T = String; then there would be no usable distinction. But you can reveal weaker properties.

// in LabelImpl
  type T <: String

Now, widening happens automatically.

scala> taggedList: List[String]
res0: List[String] = List(hello, world)

scala> report: (List[Label] => Unit)
res1: List[Label] => Unit = $$Lambda$3348/1710049434@4320749b

Narrowing is still forbidden; T and String are still separate.

scala> (taggedList: List[String]): List[Label]
<console>:23: error: type mismatch;
 found   : List[String]
 required: List[hcavsc.translucent.Labels.Label]
    (which expands to)  List[hcavsc.translucent.Labels.Label.T]
       (taggedList: List[String]): List[Label]
                  ^

Moreover, erasure looks like AnyVal subclassing erasure again.

// javap -c -cp target/scala-2.12/classes hcavsc.translucent.MyFirstTests

  public java.lang.String combineLabels(java.lang.String, java.lang.String);

However, this makes it very difficult for typeclass resolution to reliably distinguish String and T. It’s also easy to accidentally untag. That’s why we took this out of Scalaz’s Tags; discriminating typeclass instances is a very useful feature of tags. If these aren’t concerns for you, globally revealed tag subtyping may be the most convenient for you.

Boxing Ints

AnyVal might seem to have better, more justifiable boxing behavior in the cast of primitive types like Int. When putting than AnyVal wrapper around Int, the custom box replaces the plain Integer box, rather than adding another layer.

final class MagicInt(val x: Int) extends AnyVal

val x = 42
val y = 84

// javap -c -cp target/scala-2.12/classes hcavsc.intsav.BytecodeTests

List(x, y)
      // skipping some setup bytecode
      13: newarray       int
      15: dup
      16: iconst_0
      17: iload_1
      18: iastore
      19: dup
      20: iconst_1
      21: iload_2
      22: iastore
      23: invokevirtual #25                 // Method scala/Predef$.wrapIntArray:([I)Lscala/collection/mutable/WrappedArray;
      26: invokevirtual #29                 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;

List(new MagicInt(x), new MagicInt(y))
      // skipping more setup
      37: anewarray     #31                 // class hcavsc/intsav/MagicInt
      40: dup
      41: iconst_0
      42: new           #31                 // class hcavsc/intsav/MagicInt
      45: dup
      46: iload_1
      47: invokespecial #35                 // Method hcavsc/intsav/MagicInt."<init>":(I)V
      50: aastore
      51: dup
      52: iconst_1
      53: new           #31                 // class hcavsc/intsav/MagicInt
      56: dup
      57: iload_2
      58: invokespecial #35                 // Method hcavsc/intsav/MagicInt."<init>":(I)V
      61: aastore
      62: invokevirtual #39                 // Method scala/Predef$.genericWrapArray:(Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray;
      65: invokevirtual #29                 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;

By contrast, the opaque T to Integer when we apply(i: Int): T. It then remains in that box until we deliberately get the Int back.

// MagicInt is defined like Label,
// but over Int instead of String
val x = MagicInt(42)
// javap -c -cp target/scala-2.12/classes hcavsc.ints.OtherTests
       0: getstatic     #21                 // Field hcavsc/ints/MagicInts$.MODULE$:Lhcavsc/ints/MagicInts$;
       3: invokevirtual #25                 // Method hcavsc/ints/MagicInts$.MagicInt:()Lhcavsc/ints/MagicInts$MagicIntImpl;
       6: bipush        42
       8: invokevirtual #29                 // Method hcavsc/ints/MagicInts$MagicIntImpl.apply:(I)Ljava/lang/Object;

// javap -c -cp target/scala-2.12/classes 'hcavsc.ints.MagicInts$$anon$1'
  public java.lang.Object apply(int);
    Code:
       0: aload_0
       1: iload_1
       2: invokevirtual #23                 // Method apply:(I)I
       5: invokestatic  #29                 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
       8: areturn

List(x, x)
      // skipping setup as before
      19: anewarray     #4                  // class java/lang/Object
      22: dup
      23: iconst_0
      24: aload_1
      25: aastore
      26: dup
      27: iconst_1
      28: aload_1
      29: aastore
      30: invokevirtual #43                 // Method scala/Predef$.genericWrapArray:(Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray;
      33: invokevirtual #46                 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;

While the boxing in the above example happened in MagicInt.apply, there’s nothing special about that function’s boxing; the standard Int boxing serves just as well.

// javap -c -cp target/scala-2.12/classes hcavsc.ints.OtherTests

val xs = List(42)
      44: newarray       int
      46: dup
      47: iconst_0
      48: bipush        42
      50: iastore
      51: invokevirtual #50                 // Method scala/Predef$.wrapIntArray:([I)Lscala/collection/mutable/WrappedArray;
      54: invokevirtual #46                 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;
      57: astore_2

val mxs = MagicInt.subst(xs)
      58: getstatic     #21                 // Field hcavsc/ints/MagicInts$.MODULE$:Lhcavsc/ints/MagicInts$;
      61: invokevirtual #25                 // Method hcavsc/ints/MagicInts$.MagicInt:()Lhcavsc/ints/MagicInts$MagicIntImpl;
      64: aload_2
      65: invokevirtual #54                 // Method hcavsc/ints/MagicInts$MagicIntImpl.subst:(Ljava/lang/Object;)Ljava/lang/Object;

val y: MagicInt = mxs.head
      73: invokevirtual #60                 // Method scala/collection/immutable/List.head:()Ljava/lang/Object;
      76: astore        4

This is nice for two reasons:

  1. subst still doesn’t imply any additional boxing beyond what the underlying primitive type implies.
  2. Where the primitive boxing is optimized, you get to keep those optimizations; AnyVal subclass boxing effectively turns off these optimizations. For example, Integer boxing is optimized, but MagicInt’s AnyVal class is not.

The one remaining problem with the tag version of MagicInt is that its erasure is still Object.

def myId(x: MagicInt): MagicInt
// javap -c -cp target/scala-2.12/classes hcavsc.ints.OtherTests
  public abstract java.lang.Object myId(java.lang.Object);

However, if you use the “translucent” variant where it is always known that type T <: Int, the erasure is the same as Int itself.

// javap -c -cp target/scala-2.12/classes hcavsc.translucentints.OtherTests
  public abstract int myId(int);

(The boxing/unboxing of MagicInt changes to match.) Unfortunately, there’s no way to tell Scala what the erasure ought to be without exposing that extra type information, which may be quite inconvenient.

Would you box a JavaScript string?

Maybe if we weren’t working with types. Since we are working with types, we don’t have to box our strings in JavaScript in order to keep track of what sort of strings they are. But Scala might want to, anyway.

val x = new Label("hi")
js.Array(x, x)

// sbt fastOptJS output
  [new $c_Lhcavsc_av_Label().init___T("hi"),
   new $c_Lhcavsc_av_Label().init___T("hi")];

Surely it doesn’t have to for our tag-like Label. And indeed it doesn’t.

val h = Label("hi")
  // compiles to
  var h = "hi";
  // fastOptJS is smart enough to know
  // that apply can be elided

val hs = js.Array(h, h)
  // compiles to
  var hs = [h, h];

val strs = Label.subst[Lambda[x => js.Array[x] => js.Array[String]]](identity)(hs)
strs(0) + strs(1)
  // compiles to
  (("" + $as_T(hs[0])) + hs[1])
  // fastOptJS is smart enough to know
  // that subst, too, can be elided

The possible existence of subst tells us something about the deeper meaning of our abstract type definition, type T = String, that holds true no matter how much of this equality we hide behind existential layers. It is this: the compiler cannot predict when the fact that T = String will be visible, and when it will not be. It must therefore not generate code that would “go wrong” in contexts where this is revealed.

For example, at one point, we saw that

Label.subst(Monoid[String])

would yield indeed produce a suitable Monoid[Label]. This means not only is the value’s type reinterpreted, but also, by consequence, its members.

scala> val labelMonoid = Label.subst(Monoid[String])
labelMonoid: scalaz.Monoid[Label.T] = scalaz.std.StringInstances$stringInstance$@6f612117

scala> labelMonoid.zero
res0: hcavsc.subst.Labels.Label.T = ""

scala> labelMonoid.append _
res1: (Label.T, => Label.T) => Label.T = $$Lambda$3184/987934553@3af2619b

However, in subst, we have charged the compiler with doing this arbitrarily complex substitution with 100% accuracy and in constant time. There are no opportunities to generate “wrappers”, not for these structures that merely employ Label in their types. And, by consequence, there’s nowhere to put code that would use some means to treat Label and String differently based on runtime choices.

If you wish to automatically add “wrappers”, you have a difficult problem already with parametric polymorphism. With higher-kinded types, you have an intractable problem.

Speaking of higher-kinded types…

Type tagging works perfectly well with parameterized types.

type KWConcrete[W, A, B] = Kleisli[(W, ?), A, B]

sealed abstract class KWImpl {
  type T[W, A, B]

  def subst[F[_[_, _, _]]](fk: F[KWConcrete]): F[T]
}

val KW: KWImpl = new KWImpl {
  type T[W, A, B] = KWConcrete[W, A, B]

  override def subst[F[_[_, _, _]]](fk: F[KWConcrete]) = fk
}

type KW[W, A, B] = KW.T[W, A, B]

This is nice for a few reasons.

  1. You can still “add a type parameter” to do abstraction on your tagged types.
  2. You can hide much of the complexity of a monad transformer stack, allowing it to infer more easily with Unapply or -Ypartial-unification. This is because, unlike standalone type aliases, scalac can’t dealias your abstraction away. (Warning: this doesn’t apply if you make the type T “translucent”; hide your types to keep them safe from scalac’s prying expander.)
  3. You can use subst to “GND” your Monad and other typeclass instances.
implicit def monadKW[W: Monoid, A]: Monad[KW[W, A, ?]] = {
  type MF[KWC[_, _, _]] = Monad[KWC[W, A, ?]]
  // KW.subst[MF](implicitly) with better inference
  KW.subst[MF](Kleisli.kleisliMonadReader[(W, ?), A])
}

“Tagless final effects à la Ermine Writers” develops this kind of type abstraction in another direction.

For the derivation of subst’s weird signature above, see “Higher Leibniz”.

Why is the : LabelImpl ascription so important?

Suppose that you ignored my comments and defined the concrete LabelImpl without an ascription.

val Label = new LabelImpl {
  // ...implementation continues as before

Then, the abstraction would disappear; you would no longer have a “new type”.

scala> val lbl: Label = "hi"
lbl: Label = hi

scala> lbl: String
res0: String = hi

scala> implicitly[Label =:= String]
res1: =:=[Label,String] = <function1>

Why did it break so hard? Well, the inferred type of val Label is different from the one you were ascribing.

scala> Label
res2: LabelImpl{type T = String} = hcavsc.broken.Labels$$anon$1@48cd7b32

That means that Label.T is no longer existential; it’s known, and known to be String. Accordingly, type Label also expands to String, and vice versa.

If you want it a new type, you must keep it existential.

Some background

The unboxed tagging technique is based on cast-free type tags in the upcoming Scalaz 7.3.0. That, in turn, was based on use of existential types in Ermine's implementation to hide expansions from scalac.

This is also a specialization of the type-member based MTL encoding I used in "Tagless final effects à la Ermine Writers". The essential difference is that individual program elements were universally quantified over the expansion of the abstract type, where here, the entire program is universally quantified over that expansion, because the existential quantifier is globally bound.

I’m certainly not the first person to explore this technique; for example, Julian Michael wrote about it several months before this article.

And, of course, if you are an ML (OCaml, SML, &c) fan, you’re probably thinking “yeah, so what? I do this all the time.” Sorry. We can be a little slow on the uptake in Scala world, where we greatly undervalue the ideas of the functional languages before us.

This article was tested with Scala 2.12.1, Scalaz 7.2.10, Scala.js 0.6.13, and Kind Projector 0.9.3. The code is available in compilable form for your own experiments via Bazaar.

2 comments:

Unknown said...

I have really enjoyed both articles.

One could be tempted to statically scan Scala source code for bad smells of AnyVal.
This brings the question about what are the good smells of AnyVal?

In particular, do you agree that using AnyVal for a Value class is acceptable since it helps avoid unnecessary runtime allocations as described here: http://docs.scala-lang.org/overviews/core/value-classes.html

Stephen Compall said...

It is a good choice for adding methods a la implicit class, or the old style of class + implicit def (implicit class does not work for all cases).

Otherwise, it seems like it is not a good choice.