Relearning Matrices as Linear Functions(dhruvonmath.com) |
Relearning Matrices as Linear Functions(dhruvonmath.com) |
[1]https://datasciencetexts.com/subjects/linear_algebra.html
I didn't pass the "normal" one but I think that's because I had another 300 level math course, capstone, and four other CS courses at the same time. I'm certain I wouldn't have passed the applied one, it looked very tedious and that's usually what gets me with homework.
Personally I found the prospect of tensor algebra to be much more intuitive than either of these; with matrices thrown in mostly as a computational device. Even a vector (through the dot product) is just a linear function on other vectors, and the notion of function composition carries through to that and to higher-order tensors.
Covariance and contravariance are a little more complicated to completely grok, but for most applications in Euclidean space (where the metric is the identity function) the distinction is of more theoretical interest anyway.
I'm not sure what the parent means by the metric being the identity function, however. The Euclidean metric is basically the hypotenuse of a triangle parameterized by two vectors. The adjacent and opposite sides of the triangle are measured to be the Euclidean norm of each vector (their length), and the hypotenuse is the shortest distance between them.
The Euclidean metric is not the only metric - you can define distance however you'd like as long as it's consistent. But I'm not sure how the identity function works as a metric, because that would map a vector to another vector, not a scalar.
That was taught right after a unit on complex numbers and trigonometry so that we could see the parallels between composing polynomial functions on complex numbers and composing affine transformations.
To this day I think that was one of the most beautiful and eye opening lessons I've had in mathematics.
In hindsight, I think I got lucky that the teachers who wrote the curriculum this way were math, physics, and comp sci masters/phd's who looked at their own educations and decided that geometry class was a great Trojan horse for linear algebra.
I found the book "Practical Linear Algebra: A Geometry Toolbox" very helpful in my study.
When I first was introduced to matrices (high school) it was in the context of systems of equations. Matrices were a shorthand for writing out the equations and happened to have interesting rules for addition etc. It took me a while to think about them as functions on their own right and not just tables. This post is my attempt to relearn them as functions which has helped me develop a much stronger intuition for linear algebra. That’s my motivation for this post and why I decided to work on it. Feedback is more than welcome.
https://www.3blue1brown.com/essence-of-linear-algebra-page
Math is Fun also has a nice writeup that explain matrix multiplication from a real world example of a bakery making pies and tracking costs:
PS. incase you didn't know, affline transformations are not linear:
f(x) = mx + b =>
f(x+y) = m(x+y) + b /= mx+b + my+b = f(x) + f(y),
f(cx) = c m x + b /= c(mx + b) = c f(x)https://www.dhruvonmath.com/2019/04/04/kernels/
The matrix/function stuff is elementary enough that I understand it intuitively (I suck at math), although it's neat to be reminded that given a enough independent points you can reconstruct the function (this breaks a variety of bad ciphers, sometimes including ciphers that otherwise look strong).
The kernel post actually does some neat stuff with the kernel, which I found more intuitively accessible than (say) what Strang does with nullspaces.
For a matrix M, denote f_M(x) = M * x. Then f_{A * B} = f_A(f_B(x)) so that f_{(A * B) * C} = f_{A * B}(f_C(x)) = f_A(f_B(f_C(x))) and also f_{A * (B * C)} = f_A(f_{B * C}(x)) = f_A(f_B(f_C(x))).
So f_{(A * B) * C} = f_{A * (B * C)} = f_A(f_B(f_C(x)))
http://www.reproducibility.org/RSF/book/bei/conj/paper_html/...
Esp the ray tracing/topology relationship is nuts.
Here is a video tutorial that goes through some of the same topics (build up matrix product from the general principle of a linear function with vector inputs): https://www.youtube.com/watch?v=WfrwVMTgrfc
Associated Jupyter notebook here: https://github.com/minireference/noBSLAnotebooks/blob/master...
One question that usually pops up that I was confused about till recently: are rank two tensor equivalent to matrices? Answer is no, e.g. see here: https://physics.stackexchange.com/questions/20437/are-matric...
Thanks for the feedback. I go into this in the next post on eigenvectors here: https://www.dhruvonmath.com/2019/02/25/eigenvectors/. I start by discussing basis vectors which I believe is what you’re looking for in your comment.
Not that the applied approach should leave out the theory, because theoretical stuff like this article give a great and intuitive understanding of linear algebra. However, the more theoretical treatments should set up things like rings, modules, and even category theory that are much less useful from an applied perspective.
For the theoretical approach I've heard good things about 'linear algebra done right'. I imagine it is less appealing for the applied approach. All I can say is be wary of the 'shut up and calculate' mindset in linear algebra. Getting the ideas behind the concepts is essentially a shortcut to understanding linear algebra without any downsides.
Essence of Linear Algebra https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2x...
Why not the category of vector spaces (morphisms are linear maps)?
The difference is that here you construct the category from a simpler premise. To construct FinVect you need to include all set objects with structure satisfying some axioms.
The category of matrices is simply positive integers with as morphisms n x m matrices between the two integers. Composition is matrix multiplication.
Here [1] is a nice overview. If you can follow what is going on there, it is worth while looking at II, III and IV.
[1] https://unapologetic.wordpress.com/2008/06/02/the-category-o...
I suppose that technically, the 'arrows are matrices' definition rules out infinite dimensional vector spaces, but I'd guess that OP meant to include them.
An argument against would be to keep to a small category.
When I went through university the standard set of courses was a Calculus course that was mostly about derivatives, a second one that was mostly about integrals, a third Calculus course that was about multi-variable Calculus. That third course necessarily had to teach matrices, and taught it as rote calculations. There was a follow-up differential equations course which refreshed people's memories of matrices..as a rote calculation.
It was done this way because the multi-variable Calculus course was a prerequisite for a lot of physics+engineering courses. So a lot of students wanted to take that sequence. Differential equations were a prerequisite for some other advanced courses. Linear algebra was pretty much just for math majors.
I took Linear Algebra the same semester I took Computer Graphics, which worked out really well for me - the first half of LA taught me everything I needed to know about transformation matrices, and the second half of CG covered 3D graphics in OpenGL. The first half of CG was all 2D graphics stuff, and the second half of LA was about eigenvectors/values - I've forgotten everything from that part of the class.
I know they taught us about matrices in high school, but I don't recall them talking about any applications at all. I think the topic was pretty drained of context, just rote application of the rules for add/subtract/multiply/etc.
For example: What is a tensor?
Wrong way to answer it: Well, the number 5 is a tensor. So's a row vector. So's a column vector. So's the dot product and the cross product. So's a two-dimensional matrix. So's a four-dimensional matrix, just... don't ask me to write one on the board, eh? So's this Greek letter with smaller Greek letters arranged on its top right and bottom right. Literally anything you can think of is a tensor, now... try to find some conceptual unity.
Then coordinate-free fanaticism kicked in, robbing the purported explanations of any explanatory power in terms of practical applications of tensors. The only thing they could do was shift indices around.
What finally made it stick is decomposing every mathematical concept into three parts:
1. Intuition, or why we have the concept to begin with.
2. Definitions, or the axioms which "are" the concept in some formal sense.
3. Implementations, or how we write specific instances of the concept down, including things like the source code of software which implements the concept.
A tensor is a function that takes an ordered set of N covariant vectors (i.e. row vectors) and M contravariant vectors (i.e. column vectors) and spits out a real number. It has to be linear in each of its arguments.
I'm pretty sure all the complicated transforms follow from that definition (though you may have to assume the Leibniz rule - I can't remember), and from ordinary calculus.
The other interpretation is that matrices are functions that take two arguments (a row vector and a column vector) and produce a real number. IMO this interpretation opens the door to deeper mathematics. It links in to the idea that a column vector is a functional on a row vector (and vice versa), giving you the notion of dual space, and ultimately leading on to differential forms. It also makes tensor analysis much more natural in general.
and reflections.
In elementary machine learning, you give two options. You should really include introduction to statistical learning by the same folks who wrote ESL. It's a great book that covers the same ground as ESL but with less math.
A metric in a traditional metric space is a global distance function; you can use the metric tensor in a Riemannian manifold to allow integration to find the distance between two points.
The reason determinants are hard to teach (in my opinion) is because a rigorous derivation of their formula isn't possible without first teaching multilinear algebra and constructing the exterior algebra. Once you do those things, the natural geometric interpretation of the determinant basically falls onto your lap. But it's still very useful for e.g. computing eigenvalues and using the characteristic polynomial, so it's taught before that context can be formalized.
Professors shouldn't teach determinants in the context of matrices, at least not at first. That's heavily computation-focused, and the symbol pushing looks really unmotivated and strange to students. Instead they should teach the basis-free definition of determinants (i.e. focus on the linear map, not the matrix transformation representing the linear map for some basis). Then the determinant is "only" the volume of the image of the unit hypercube under the linear transformation, which is where the parallelepiped comes in. If the linear transformation is invertible, the unit hypercube is transformed from an n-dimensional cube into an n-dimensional parallelogram, from which you can geometrically see the way the linear map transforms the entire vector space it's defined over.
3Blue1Brown has a very good video on the geometry underlying the determinant[2]. For a more rigorous presentation which constructs the exterior algebra and derives the determinant formula using the wedge product, Noam Elkies has notes[3][4] for when he teaches Math 55A at Harvard. Incidentally Noam Elkies uses Axler's book, and while he obviously approves of it he's pretty upfront in asserting that the determinant should be taught anyway[5].
________________________
2. https://www.youtube.com/watch?v=Ip3X9LOh2dk
3. http://www.math.harvard.edu/~elkies/M55a.10/p8.pdf
This means it makes sense that det(A) = 0 means A is non-invertible. It also makes a lot of sense when the jacobian pops up in the multi-dimensional chain rule.
Given the above, and the Cayley–Hamilton theorem, I never really had to know why the determinant was calculated the way it is. The above give enough of an interface to work with it.
First think about row and column vectors. A row vector and a column vector can be combined via standard matrix multiplication to produce a real number. From that perspective, a row vector is a function that takes a column vector and returns a real number. Similarly, column vectors take row vectors as arguments and produce real numbers.
It turns out that row (column) vectors are the only linear functions on column (row) vectors. This result is known as the Reisz representation theorem. If I give you a linear function on a row vector, you can find a column vector so that computing my function is equivalent to calculating a matrix multiply with your column vector.
Now on to matrices. Matrices take one row vector and one column vector and produce a real number. I can feed a matrix a single argument - the row vector, say - so that it becomes a function that takes one more argument (the column vector) before it returns a real number. Sort of like currying in functional programming. But as we said, the only linear functions that map column vectors into real numbers are row vectors. So by feeding our matrix one row vector, we've produced another row vector. This is the "matrices transform vectors" perspective in the OP's article. But I think the "Matrices are linear functions" perspective is more general and more powerful.
This perspective of vectors, matrices, etc... as functions might seem needlessly convoluted. But I think it's the right way to think about these objects. Tricky concepts like the tensor product and vector space duality become relatively trivial once you come to see all these objects as functions.
Reminds me of the time an algebraist mentioned to me that he was working on profinite group theory. I asked what a profinite group was, and he immediately replied 'an inverse limit of an inverse system', with no follow up. Well thanks buddy, that really opened my eyes.
It would ordinarily be weird to represent shear transformations using rotations and scalings because shear matrices are elementary. But it checks out.
EDIT: To state my point more clearly: in textbooks, "scaling" is the linear map that is induced by the "scalar multiplication" in the definition of the vector space (that is why both terms start with "scal").
The idea being that a shear is relatively much faster on weaker CPUs, relative to doing a "proper" (reverse mapping) rotation.
A nice write-up can be found here: https://www.ocf.berkeley.edu/~fricke/projects/israel/paeth/r...
Singular matrices are special in the sense that they keep the matrix monoid from being a group. My category theory isn't strong enough to characterize it, but this probably also has a name.
Edit: I think the singular matrices are the 'kernel' of the right adjoint of the forgetful functor from the category of groups to the category of monoids. Though I must admit a lot of that sentence is my stringing together words I only vaguely know.