A good background video

Transposes are inextricably linked to the concept of duality.

The transpose of a vector is its dual vector: 𝑣𝑇 essentially represents the operation β€œproject onto 𝑣”, and it can be applied to some vector π‘₯ simply by left-multiplying, i.e. 𝑣𝑇π‘₯. Therefore, 𝑣𝑇 should primarily be thought of as a function.

One way to visualize a row vector 𝑣𝑇 is by drawing its level curves. That is, for each integer π‘˜, we draw a single line representing all vectors π‘₯ such that 𝑣𝑇π‘₯=π‘˜. For instance, here’s how we’d represent the row vector [1,1]:

Β‘6Β‘4Β‘2246Β‘4Β‘224xy

The line going through the origin corresponds to π‘˜=0, and the line going through (0,1) and (1,0) corresponds to π‘˜=1.

Note that scaling up the vector will simply make the level curves more dense. Shown below are the level curves for 𝑣𝑇=[2,2]:

Β‘6Β‘4Β‘2246Β‘4Β‘224xy

Lemma

The distance between the level curves of 𝑣 is 1|𝑣|.

Now, we’ll use this visualization to prove the following identity geometrically:

Theorem

For any 2x2 matrix 𝐴, det(𝐴)=det(𝐴𝑇).

As a quick refresher, det([π‘₯1π‘₯2]) is the signed area of the parallelogram defined by the column vectors π‘₯1 and π‘₯2. For instance, det((1241)) represents the following area:

Β‘112345Β‘11234x1x2xy

Equivalently, the determinant is the scale factor by which 𝐴 transforms the area of any shape.

Now, how do we visualize the scaling induced by 𝐴𝑇? It will actually be easier to visualize this in the backwards direction---that is, to find the area of the shape that 𝐴𝑇 maps to the unit square. If this area is π‘₯, then we hope to show that det(𝐴)=1π‘₯.

Let’s draw in level curves for 𝐴=(1241). Then, by definition of the level curves, the shape which 𝐴𝑇 maps to the unit square is precisely the shaded region:

Β‘1Β‘0:8Β‘0:6Β‘0:4Β‘0:20:20:40:60:81Β‘1Β‘0:8Β‘0:6Β‘0:4Β‘0:20:20:40:60:81xy

Now, how can we relate the area of this parallelogram to our first one? Well, according to our lemma, the two altitudes of the new parallelogram are 1|π‘₯1| and 1|π‘₯2|, respectively. We also know that the two altitudes of the original parallelogram are det(𝐴)|π‘₯1| and det(𝐴)|π‘₯2|, respectively. And since the two parallelograms have the same angles, this means that they must be similar: the new one is just the original one scaled by a factor of 1det(𝐴)! This means the area of our new parallelogram is simply

det(𝐴)(1det(𝐴))2=1det(𝐴)

as desired.

By the way, the basis which defines the second parallelogram has a special name: it is the dual basis of 𝐴.

Also unfortunately, this similarity argument doesn’t seem to extend to higher dimensions. However, I hope it still provided a more visceral feel for what exactly the transpose does, and how it’s visually related to the original basis.