Introduction to tensors

The formalism of tensors plays a major role in the fundamental theories of physics : general relativity, quantum physics, and even already classical physics (vector calculus and screw theory : the usual courses on these subjects which pretend to not use tensors, are just awful).
This text will present a definition of tensors that will look quite different from either of both traditional definitions (one as "Einstein summation convention" by physicists, and a quite abstract definition from category theory by mathematicians) that aims to make it more intuitive and convenient to learn, for use in physics.

For this, the tensor product E⊗F between vector spaces E and F, will be defined by duality with another vector space whose definition is very close to the notion of tensor product between spaces, but only equivalent to it in the case of finite-dimensional spaces. It thus requires a different notation. As I could not find it defined by other authors (who failed to see the importance of defining it in this general way) I had to choose a new notation. I hesitated between ⊙ and ⊠ (to not mention ⊛). You can tell me your preference, of if you know an already existing convention for it.

This text, focusing on the basic definitions, may not have complete proofs yet (would need more work and algebraic preliminaries but I have so many other subjects to work on...).
One of the main other steps that would be needed after this (and before use in physics), would be the definitions and properties of symmetric and antisymmetric tensors. I wish to find a new more convenient notation for symmetric and antisymmetric tensors, before undertaking their presentation.

Necessary preliminary : Vector spaces in duality

Definitions of the tensor product, in duality with the space of (continuous) multilinear forms

Taking two pairs of dual vector spaces, (E,E') and (F,F'), we can define the space E'⊠F' of all "continuous bilinear forms" on EF, that is maps from EF to ℝ whose two currified forms map E into F' and F into E' (every element of E defines the same map from F to ℝ as an element of F', and vice versa). This operation between spaces can be generally defined between any two sets of functions E'⊂ℝE and F'⊂ℝF (see the sum of functions defined in 2.7):

E'⊠F' = {∐f |f∈F'E}{t∐g |g∈E'F}
= {h∈ℝEF| (Im h⊂ F' ) ∧ (Im h⊂ E')}

This simple definition implicitly contains (is equivalent to) the requirement of bilinearity, and (in infinite-dimensional cases) continuity, of each element of E'⊠F' with respect to its two variables in E and F (this information is carried by the choice of E' as a subspace of ℝE). Each of both maps, from E to F' and from F to E', is linear.
If E is finite dimensional then the continuity condition disappears (it is always satisfied): these maps from F to E' are all linear maps from F to E' (and the same exchanging E and F). Otherwise it is anyway the set of all continuous linear maps from F to E', for the topology naturally defined by the duality.
This construction is essentially unaffected by the replacement of E by one of its generating subsets. In particular, for any basis B⊂E, the space E' of all linear forms on E is identifiable as the space ℝB of all functions on B (without any more linearity condition), and the space E'⊠F' is essentially unchanged when replacing E by B.

Considering the natural map from (E'⊠F' (EF)) to ℝ, the above procedure makes a natural map x∈E, y∈F ↦ x⊗y ∈ E⊗F where E⊗F is the dual space to E'⊠F' generated by such x⊗y.
Thus for all t∈E'⊠F', the scalar product (x⊗y).t is defined by t(x,y).

This can be generalized to the tensor product of 3 or more vector spaces : E'⊠F'⊠G' is the set of all maps from EFG to ℝ whose currified forms define maps from EF to G', from EG to F' and from FG to E'.
Then such an object also defines a map from E to F'⊠G' (and the same in 2 other ways).
This space E'⊠F'⊠G' is identifiable to (E'⊠F')⊠G', and in duality with a space E⊗F⊗G defined the same way as above (which is identifiable to (E⊗F)⊗G).

Tensor expressions

Tensor expressions are constructed in a different way from ordinary logical or algebraic expressions.
Let us recall how ordinary algebraic expressions are formed :
Each algebraic expression, distinguishes a main symbol and a list of other well-distinguished sub-expressions (that are other expressions) entered as data to this symbol. In particular an algebraic expression is made of operation symbols.
The format of the list of entries to each operation symbol is determined by the type of the operation named by this symbol: the number of entries (arity), the set that each entry must belong to (domain of each argument); a set that the result belongs to.
The whole expression also has a type ( list and nature of free variables and nature of the result), the one of the operation it defines.
The list of entries need not be always labelled by the numbers 1,...,n : any given abstract finite set A of n elements can be used instead, to label by the index i A the i-th domain Ei of the i-th variable.
Now similarly, each tensor expression and each tensor symbol has a specific type : a set A of some finite number n of elements (the arity), and a family of vector spaces (Ei) for iA, indicating that this symbol or expression must take value in E1⊠...⊠En,  - thus Ei is the dual of the domain of the ith free variable.

But the same tensor symbol with arity n can be interpreted as an algebraic (operation) symbol in n+1 ways: either as an n-ary operation with arguments in E'1,..., E'n and values in ℝ, or in n ways, for each kA as an (n-1)-ary operation with arguments in the E'i except for i=k so as to give a result in Ek.
For example a tensor symbol with arity 3, belongs to some space E⊠F⊠G. It can be either read as an operation between E',F',G' with values in ℝ, or as an operation between E' and F' with values in G, and so on for the two other arguments.
Nullary symbols (n=0) represent scalars (in ℝ)
The type of a unary symbol or expression (n=1) directly gives the vector space it belongs to when seen as a constant (which is why we choose the convention to define the type of a tensor by the duals of the domains of its arguments).

Each tensor expression of type A is linear combination of monomial tensor expressions of type A that consist in graphs structured in the following way.

The symbols in the graph occur in bulk, without any distinguished main symbol, thus without the resulting kind of order between them.
Note that contrary to algebraic expressions, tensor monomial expressions do not allow any repetition of a free variable (but each free variable appears exactly once in every monomial component of the linear combination).
But we need to examine the problem : do tensor expressions really make sense (give a well-defined value in that intended space E1⊠...⊠En) ?

Meaning of a tree-like tensor expression

A monomial tensor expression is a tree if between any 2 vertices there exists one and only one path that does not repeat any vertex (so as to forget paths that can be obviously shortened by cutting some edges). In other words it is a graph that :

Such an expression can be interpreted as an algebraic expression in as many ways as there are vertices, edges and free ends in the graph; and all these ways give the same result in the expected space.
Indeed, every vertex, or edge, or free end, can be chosen to be seen as the main symbol of the expression (if a vertex it is an operation with values in ℝ; if an edge it is a scalar product; if a free end it is an operation symbol with vectorial value); every edge is marked by the orientation of the unique path to this "main symbol", and every vertex is interpreted as the operation symbol that follows these orientations.
All these interpretations give the same result because between any two choices of main symbol there is a path, and the result is preserved at every step of this path (from each edge to each next edge), thanks to the identities between the different algebraic interpretations of each symbol (vertex).

Meaning of tensor expressions without loop

Any graph that is not connected is divided into several separate connected components in a unique way.

The result of such an expression is given by separately making the computation in each tree in the above way, then multiplying the results. This can either be seen as a multiplication between numbers, or between numbers and one vector, depending on the interpretation of the expression.
This possibility to interpret disconnected graphs can also be expressed as follows:

The injection from E⊗F to E⊠F

There is a natural map from E⊗F to E⊠F defined by mapping each (x,y) in EF to the map sending each (x',y') in E'F' to the real number obtained by multiplying in ℝ the scalar products : (x.x')(y.y'). Indeed for every (x,y) EF, this map from E'F' to ℝ has its currified forms defining x'↦(x.x')y from E' to F, and y'↦(y.y')x from F' to E, thus indeed belongs to E⊠F.

The rank

Every element t of a tensor product E⊗F has finite rank (in the sense of the rank of a matrix, to not confuse with the traditional use of this word for tensors, that we call here its arity or degree, here equal to 2), defined as the minimum number of elements of the form x⊗y (the rank 1 element) whose sum gives t.
This rank is also equal to the dimension of the image set of each of both maps that t defines as an element of E⊠F, from E' to F and from F' to E.

Proof: The image in F of the map defined by x1⊗y1+...+xn⊗yn from E' to F, is contained in the subspace of F generated by y1,...,yn, thus its dimension is no larger than n.
If the y1,...,yn were not linearly independent, they could be decomposed in another basis with a smaller number of elements (made of some of them), providing another expression of that element of E⊗F as a sum of a smaller number of rank 1 elements. The same for the x1,...,xn. The linear independence of the x1,...,xn ensures that the image is equal to the subspace of F generated by y1,...,yn, that has dimension n.

Note that the proof of this equality between expressions in E⊗F is processed in the classical concept of E⊗F (universal algebraic : as quotient of the set of formal combinations of elements of E'F' by the relations in each of E' and F') rather than as a dual to E'⊠F'. From this we can deduce that the map from the classical E⊗F to E⊠F is injective, and therefore both definitions of E⊗F coincide. We shall identify E⊗F to its image in E⊠F.

Now let us directly define the rank of an element of E⊠F as the dimension of each of its images in E and in F (and thus cannot exceed the smallest of the dimensions of E and F); it is also the dimension of the dual vector spaces it defines in the role of a scalar product between (the quotiented) E' and F'.
An element t of E⊠F belongs to E⊗F if and only if its rank is finite. In this case, its two images A in E and B in F are in duality to each other by defining for every x ∈A and y∈B, their scalar product by x.y'=y.x' (= x'ty') for any elements y'F' and x'∈E' such that ty'=y and x't=x.
The choice of a basis x1,...,xn of A and its dual basis y1,...,yn of B gives a decomposition of t as t = x1⊗y1+...+xn⊗yn

Thus E⊗F = E⊠F when E or F is finite dimensional, but generally not otherwise. Anyway both E⊗F and E⊠F are duals to E'⊗F' .

The identity element, the trace

Consider the case of F=E'. Then we have the natural element I in E⊠E' (also usually noted as δ (small delta) named "Kronecker symbol") that gives the scalar product itself (as a map from E'E to ℝ) and behaves as the identity from E to itself, and the identity from E' to itself. Seen as a map from E'⊗E to ℝ, this element is also called the trace. Its rank equal to the common dimension of E and E', thus belongs to E⊗E' only if this dimension is finite.

Now if E has finite dimension then we have a basis of n vectors e1 ... en, and its dual basis e'1...e'n, by which the element I can be written as the image in E⊠F of the element e1⊗e'1+...+en⊗e'n of E⊗E'.
Let us apply the trace function to this element I of E⊗E' itself. This transforms each tensor product into a scalar product, thus giving (e1.e'1)+...+(en.e'n)= 1+...+1 = n. This cannot be done in an infinite dimensional space, as the result would be infinite. So we have this rule : for every vector space, its dimension equals the trace of the identity in this space.

Meaning of any tensor expression without infinite-dimensional loop

Tensor expressions have the following properties, that we can verify in previous cases (without loop), and that will be postulated as general rules:
We saw above (with the trace of the identity) that it is not generally possible to make sense of a tensor expression containing an infinite-dimensional loop, that is a loop (path in the graph that comes back to itself) where all edges are labelled with infinite-dimensional spaces, and vertices have infinite rank. (At least in an algebraic manner ; let us not mention constructions where they could be defined by infinite sums that converge to a limit, that could be used to generalize the concept).
But let us show that it makes sense in all other cases, that is
For this, choose a way to "cut the graph" at least once at some finite-dimensional part of every loop : either
As each of these vertices is replaced by a linear by a linear combination of disconnected graphs, this produces a big linear combination of graphs where all loops are cut (this combination does not depend on the order between vertices to which the decomposition is applied, thanks to the commutativity and associativity of addition). Thus a well-defined result according to the above constructions.

This result does not depend on the choice of decomposition. Indeed if you have 2 decompositions applied to edges (or at least that does not apply 2 different decompositions to the same vertex), then let us consider making both decompositions together (if an edge is decomposed two ways, let us see it as a long edge through 2 copies of I, and apply the decomposition to a different copy of I in both cases). Then we can verify that the result  of the double decomposition equals that of each of both decompositions.

Another way of seeing it, is to consider that an element of E⊠F with finite rank is identified to an element of E⊗F, which is in duality with E'⊠F'. The use of this duality gives meaning to the expression.

This formalism provides the computation : dim(E⊗F)= (dim E) (dim F)

How transformation formulas are obtained

The expression of how components of tensors are transformed during changes of coordinates (or transformations of the composing spaces), can be naturally obtained by the following tools:
Any family of n vectors in a space E can be formalized as an element of ℝn⊗E. It is a basis when it is inversible. In this case, its inverse, that belongs to ℝn⊗E', defines the dual basis.
In many cases, the trick to avoid any risk of mistake is to introduce vector spaces with special names to label edges in the tensor expressions, such as "ℝn as a representation of E in this basis", thus distinct from its dual "ℝn as a representation of E' in the dual basis".

(to be continued)

Interesting papers

Kindergarten Quantum Mechanics is an application of the Penrose diagrams notation for tensors, to the case of quantum mechanics.

Back to main page