Transformer 式

{pk,2i=sin(k/100002i/d)pk,2i+1=cos(k/100002i/d)

具有外推性

相对位置编码

{qi=(xi+pi)WQkj=(xj+pj)WKvj=(xj+pj)WVai,j=softmax(qikj)oi=jai,jvj

展开 qikjT

qikj=(xi+pi)WQWK(xj+pj)=(xiWQ+piWQ)(WKxj+WKpj)