We now apply our algorithm to the problem of determining the L1 distance between joint and product distributions as described in Problem 1.
Researchers suggest that NMT models learn sentence representations that capture meaning. We inspected whether distinct types of semantics are captured by NMT encoders. Our experiments suggest that NMT encoders might learn the most about semantic p...
There aren't more papers