Thinking about this some more: 1+x+y+min(abs(x),abs(y)) <- right? .. at a hardware level, you really don't have to do a full bitwise comparison. You can ignore the size, just check 1 or 2 of the most significant bits, and call that good enough. Small fraction of a fraction of a clock cycle.
That's how it should work. If you can easily swap the numbers on hardware so that you know y is always the smaller one, then the equation as originally described in that paper is: 1+x+2*y. Last part is essentially free bit shift so that would avoid adding a constant.
OR, you could do the xm * ym multiplication, but just on the first couple bits. Since multiplication is O(N²) with respect to the number of bits, if you just look at a couple bits, it's a very fast op.