Why tensors? A beginner's perspective

Why tensors? A beginner's perspective(mfaizan.github.io)

173 points by mfn 4 years ago | 105 comments

xyzzyz 4 years ago |

That was explanation from a perspective of someone acquainted with modern physics. As such, it will make sense to physicist, but no sense to most everyone else, including mathematicians who don’t know modern physics.

For example, in the beginning, author describes tensors as things behaving according to tensor transformation formula. This is already very much a physicist kind of thinking: it assumes that there is some object out there, and we’re trying to understand what it is in terms of how it behaves. It also uses the summation notation which is rather foreign to non-physicist mathematicians. Then, when it finally reaches the point where it is all related to tensors in TensorFlow sense, we find that there is no reference made to the transformation formula, purportedly so crucial to understanding tensors. How comes?

The solution here is quite simple: what author (and physicists) call tensors is not what TensorFlow (and mathematicians) call tensors. Instead, author describes what mathematicians call “a tensor bundle”, which is a correspondence that assigns each point of space a unique tensor. That’s where the transformation rule comes from: if we describe this mapping in terms of some coordinate system (as physicist universally do), the transformation rule tells you how to this description changes in terms of change of the coordinates. This setup, of course, has little to do with TensorFlow, because there is no space that its tensors are attached to, they are just standalone entities.

So what are the mathematician’s (and TensorFlow) tensors? They’re actually basically what the author says, after very confusing and irrelevant introduction talking about change of coordinates of underlying space — irrelevant, because TensorFlow tensors are not attached as a bundle to some space (manifold) as they are on physics, so no change of space coordinates ever happens. Roughly, tensors are a sort of universal objects representing multi linear maps: bilinear maps V x W -> R correspond canonically one-to-one to regular linear maps V (x) W -> R, where V (x) W is a vector space called tensor product of V and W, and tensors are simply vectors in this tensor product space.

Basically, the idea is to replace weird multi linear objects with normal linear objects (vectors), that we know how to deal with, using matrix multiplication and stuff. That’s all there is to it.

mbbutler 4 years ago | |

Why are you complaining that the author didn't talk about tensors as they are used in tensorflow? Tensorflow is never even mentioned in the piece.

The author is perfectly clear in the first sentence that the piece's focus is about the usefulness of tensors in a physics context.

xyzzyz 4 years ago | | |

Huh, you’re right, not sure why I thought it’s TensorFlow related.

ABeeSea 4 years ago | |

> Instead, author describes what mathematicians call “a tensor bundle”, which is a correspondence that assigns each point of space a unique tensor.

Technically, that’s a tensor field which is a section of the tensor bundle. Similarly, a vector field is a section of the tangent bundle (the collection of all the tangent spaces of the points on the manifold). A vector field is just a choice of a tangent vector for each point from that point’s tangent space.

hinkley 4 years ago | |

> author describes tensors as things behaving according to tensor transformation formula

In grade school it drove me nuts when the homework required us to describe a word without using the word (or it’s Latinate siblings). And yet as an adult there are few enough weeks that go by where some grownup doesn’t try to pull that same trick.

If you think developers are guilty of circular logic, check out some of the math pages on Wikipedia. You can get lost in moments.

jhokanson 4 years ago | | |

Speaking of math pages on Wikipedia ... and math text more generally

Is it just me or are we horrible at teaching advanced math? Where are the examples (with actual numbers)? Where is the motivation? Where are the pictures?

chobytes 4 years ago | | |

I think a lot of circularity occurs in mathematics because we don't typically qualify our utterances when it can be implicitly understood.

Eg "Numbers (formal) are those objects which behave like numbers (informal)."

CatsAreCool 4 years ago | | |

I'm working on a language MathLingua (www.mathlingua.org) whose goal is to precisely describe mathematics using a format that is easy to read and understand to help address ambiguity in mathematical texts written using natural language.

It is still a work in progress, but does it help address some of the problems you see in learning mathematics? Any feedback is greatly appreciated. Thanks.

tagrun 4 years ago | |

It has nothing to do with tensor fields, uniform/constant tensors still obey the proper coordinate transformations, that's the defining property of any tensor. (With non-uniform tensor fields, covariant derivatives also pick up a correction, but that's a separate thing.)

TensorFlow "tensor"(and most other use of "tensor" in programmer jargon) is not a tensor at all, it's just a multidimensional array.

contravariant 4 years ago | | |

Mathematicians would disagree with you there. There are no coordinates to transform in an ordinary tensor space and therefore no way for a tensor to be affected by such a transformation.

Matrices (or linear transformations in general) are important examples of tensors. There's a nice adjunction between tensor spaces A(x)B and the space of linear transformations B=>C given by:

Hom(A(x)B, C) = Hom(A, B=>C)

In the case of Tensorflow I think they do actually still talk about linear transformations of some kind so it's perfectly fine to call them tensors.

catgary 4 years ago | | |

What do you think the tensor product of finite dimensional vector spaces looks like?

ummonk 4 years ago | |

The post was also a poor explanation for someone doing modern physics. [edit: not true actually I should have read the rest of the post - it’s a good post]

Wald's approach in General Relativity is much better - he treats Tensors as a multilinear map from vectors and dual vectors to scalars.

He then derives the underlying coordinate transformaton rules, for the vector spaces used in differential geometry. But

mfn 4 years ago | | |

That’s the approach I used as well in the second half of the article - I just mentioned the transformation law in the beginning since that’s what most physics students encounter first.

Most of the article tries to provide some intuition behind why multilinear maps, which sound like a fairly abstract concept, might be relevant in physics. The key link being the importance of coordinate invariance.

I didn’t go into deriving the coordinate transforms from the multilinear map definition as I didn’t feel that it’d provide much better intuition, but I did mention the equivalence near the end.

kaashif 4 years ago | |

> author describes tensors as things behaving according to tensor transformation formula

Yeah, the idea that there are pre-existing things that we're trying to describe is somewhat weird to me when we're trying to come up with a definition of a tensor. The whole point of mathematics is that you come up with the definitions and theorems fall out.

In particular, this comment is funny and speaks to some difference in how I and the author view what we're doing when defining a tensor:

> But why that specific transformation law - why must tensors transform in that way in order to preserve whatever object the tensor represents?

Because we defined it like that! When you make the definition "a tensor is a thing that follows X laws", you don't get to ask why, you just defined it!

Just a funny bit of phrasing, I get what is meant :)

edflsafoiewq 4 years ago | | |

> The whole point of mathematics is that you come up with the definitions and theorems fall out.

That's just how it's presented in textbooks. It's obviously not math is actually done.

hansen 4 years ago | |

To be a bit pedantic: the identification of tensors with multilinear forms requires finite dimensions (or reflexive topological spaces).

l33t2328 4 years ago | |

I think it’s a little funny you said that

>[the explanation in the OP] will make sense to physicist, but no sense to most everyone else, including mathematicians

And then went on to describe tensors in a way that is unfriendly to non mathematicians by saying

> tensors are a sort of universal objects representing multi linear maps: bilinear maps V x W -> R correspond canonically one-to-one to regular linear maps V (x) W -> R, where V (x) W is a vector space called tensor product of V and W, and tensors are simply vectors in this tensor product space.

bigger_cheese 4 years ago | |

Engineering usage seems to match the physics usage. In classic engineering fashion however we were always taught just to 'plug them in' without learning all the minutia that go with them.

For example the stress and strain calculations which are used for calculating Deformation (Say if you were rolling a sheet of steel in a mill) makes use of tensors and also something called an "Invariant" I assume this also comes from Physics/Mathematics world.

mr_mitm 4 years ago | |

Thanks for this summary.

Even as a physicist I found it highly confusing when I got told in physics classes that a tensor is "just a thing (or object) that behaves like so under coordinate transformation". Like, what do you mean by "thing"? I have no intuition to this yet, I need it concise definitions! Fortunately I took a differential geometry class at the same time, which was really helpful.

ericphanson 4 years ago |

I was happy to see that this article is actually talking about tensors, not just multidimensional arrays (which for some reasons are often called tensors by machine learning folks).

725686 4 years ago |

A wonderful little video to understand what tensors are, by Daniel Fleish:

https://www.youtube.com/watch?v=f5liqUk0ZTw

Very simple and basic.

Edit: incorrectly wrote vectors instead of tensors.

mettamage 4 years ago | |

Wow, that's such a good video. Thanks! Haha, mind blown really. And other than graph theory, I never took a college level math course (I artfully skipped almost all math during my CS degree), I'm doing pre-calculus at the moment, because I want to get better at it.

saberience 4 years ago |

This doesn't seem like it's for beginners.

VeninVidiaVicii 4 years ago | |

> Most commonly, a tensor is defined as being anything that transforms like a tensor.

Definitely not beginner level.

brummm 4 years ago | |

Hmm, this is stuff physicists learn in their first year undergrad classes for mathematical foundations. Seems to me it's the very definition of beginner.

cyber_kinetist 4 years ago | | |

I don't know what undergraduate program you have gone through, but this is definitely second-year or third-year course material for most physics degrees in universities. Maybe if you've already taken lots of AP classes in high school then you might be able to skip some stuff, but we're talking about the standard curriculum here.

Normally, you first study the distinction between vectors (which can be expanded to tensors) and scalars in second-year Analytical Mechanics class. You also get a taste of tensors toward the later material in Electromagnetism (which is also probably second-year). And you finally arrive at a rigorous definition of tensors when you take Mathematical Physics (second-year or third-year depending on your skills).

bmitc 4 years ago |

Anyone interested in a visual exploration should checkout Geometrical Vectors by Gabriel Weinreich.

https://www.maa.org/press/maa-reviews/geometrical-vectors

billfruit 4 years ago | |

Is there any book that treats whole off geometry using vectors?

bmitc 4 years ago | | |

I’m not sure I understand the question enough to answer. Do you mean something like differential geometry? There, the theory is built upon vectors and covectors (i.e., differential forms) that are associated with tangent spaces and cotangent spaces, respectively. But that is modern differential geometry and not classical geometry.

beaconstudios 4 years ago |

OK that helps me to understand why tensorflow is called what it is - if a tensor turns a set of vectors into a scalar that's exactly what an artificial neuron does with weights and inputs, and they are linked up to form a data flow graph.

Koshkin 4 years ago |

Here is a really good resource for a beginner:

https://grinfeld.org/books/An-Introduction-To-Tensor-Calculu...

ok123456 4 years ago | |

Is that you Pavel?

Beldin 4 years ago |

The way I think of it: you have 0-dimensional arrays of numbers (plain numbers or scalars). You have 1-dimensional arrays of numbers (a list of N numbers or an N-vector). You have 2-dimensional arrays of numbers (an NxM matrix). We can extend this concept to 3- and 4-dimensional arrays and even further.

The kicker? All of them are tensors. Tensor is just a generalisation of the concept.

I am no licensed mathematician, so this could be off. However, every time I dive into this topic, I have to wade through way too complex mathnobabble to arrive at that notion. So let's keep it simple: tensors are a mathematician's template for arrays of any dimension.

mkehrt 4 years ago |

A (d_0 * d_1 * ... * d_{k-1} * d_k) tensor is just a linear map from a (d_0 * d_1 * ... * d_{m-1} * d_{m+1} * ... * d_{k-1} * d_k) tensor to a (d_0 * d_1 * ... * d_{n-1} * d_{n+1} * ... * d_{k-1} * d_k) tensor, where a () tensor is a scalar, right?

(I kid, but I think this is true, right?)

chobytes 4 years ago |

My version is just: Tensors allow us to write data and operations on data in a way which does not depend on how we chose to represent them.

For example, if I have a vector x in V and a map T from V to W, then I would like the truth of T(x)=y to be independent of how I represent T and x.

zardo 4 years ago | |

I like the concrete example from when I first used tensors in school. Stress in a block of concrete. You can choose any basis you like to represent the stresses and transform between them.

Whether or not the concrete block breaks under that stress obviously does not depend on your choice of basis or units, so your transformation rules had better reflect that reality.