The Fundamental Theorems of Calculus

How I think about the so-called “Fundamental Theorems of Calculus” is a little different from how others think about them. I don’t even think of these as having to do with derivatives and integrals, as such.

Part of them is an idea from elementary school: subtraction and addition are opposites. Put another way, differences compose. The difference from A to Z is the difference from A to B, plus the difference from B to C, plus the difference from C to D, and so on through the difference from Y to Z. So, even in a discrete context, if you consider the operations “Take successive differences of a series of values” and “Take partial sums of a series of values”, these are inverse operations (taking the inputs of the former and outputs of the latter to be specified up to an additive constant, or just as well, presumed to start from zero). This part is quite familiar long before ever getting to a calculus class.

The further content of the “Fundamental Theorems of Calculus”, the part you are made to prove in an Analysis class, is about the relationship between infinitesimal and finitesimal averages.

Again, let’s not think of derivatives and integrals, as such. Let’s think of “punctile” functions and “extensive” functions. A punctile function is an ordinary function: it assigns a value to individual input points. An “extensive” function assigns a value to ranges of input; let’s say specifically ranges with positive size.

What’s more, we’re only interested in two kinds of extensive functions: first, there are “additive-extensive” functions. These have the property that whenever a large range is split into various subranges which make it up, the value of the function on the large range is the sum of its values on the subranges (no matter how the splitting is done). For example, this is the way differences between endpoint values compose, as mentioned earlier.

Secondly, there are “averaging-extensive” functions. These have the property that whenever a large range is split into subranges, the value of the function on the large range is the weighted average of its values on the subranges (weighted proportional to their size).

Note that these two concepts are in one-to-one correspondence with each other: given two extensive functions F and G, such that the value of F on any range is the value of G on that range times the size of that range, we have that F is additive-extensive iff G is averaging-extensive. So they’re just two different ways of thinking about the same data; this kind of data can be looked at additively or averagingly, so to speak, depending on what is convenient at the moment.

In general, I think in terms of additive-extensive functions a lot, they are very natural for algebraic calculations, but in this post, it’ll be most natural to talk and think in terms of averaging-extensive functions.

It would be nice if punctile and extensive functions similarly turned out to be two different ways of looking at the same data, and this is what the “Fundamental Theorems of Calculus” say. Specifically, suppose we have a process which turns punctile functions into averaging-extensive functions, thought of as turning f into the function which assigns to any range the average value of f over that range (in short, this is the process of averaging over ranges). And suppose we have a process in the other direction, which turns averaging-extensive functions into punctile functions, thought of as turning F into the function whose value at a point is the limiting value of F on any range whose values are all infinitesimally near that point (in short, this is the process of limiting towards points).

[Footnote: Note I do not demand here that the range actually include the indicated point; in traditional ways of talking about everything, this will turn out to correspond to “strong” differentiation \(\lim_{x_1, x_2 \to x} \frac{F(x_1) - F(x_2)}{x_1 - x_2}\), as opposed to “ordinary” differentiation which adds the constraint that \(x\) be between \(x_1\) and \(x_2\), or ultimately equivalently the constraint \(x_2 = x\). Strong differentiation is much better behaved than “ordinary” differentiation, and being strongly differentiable throughout a range is the same thing as being continuously ordinarily differentiable throughout that range. There’s really no need for traditional ordinary differentiation with all its pathologies to be taken as the default notion of differentiation, and certainly no need to expose introductory calculus students to those pathologies (or such technical formality at all, but certainly not such pathologies!).]

If our punctile-to-extensive and extensive-to-punctile processes satisfy certain natural to assume properties, then they turn out to indeed invert each other as expected. This is the content of the Fundamental Theorems of Calculus. It is like so:

The first Fundamental Theorem of Calculus tells us that, if f is a continuous (or regular enough in an even weaker sense) punctile function, then the average value of f on a range entirely infinitesimally near x is infinitesimally near the value of f at x. In other words, turning the punctile function f into an averaging-extensive function (via averaging over ranges), and then turning this back into a punctile function (via limiting towards points), we return back to f. (This is ordinarily phrased as “\(\frac{d}{dx} \int f = f\) for continuous \(f\)”, but let’s view it my way instead).

Why is this fact true? It’s tautologically true precisely when f satisfies the condition that its average value over any range infinitesimally near x is infinitesimally close to its value at x, the regularity condition of note. This condition will straightforwardly follow from continuity however you care to formalize continuity, when one also makes the natural presumptions about how averaging over ranges works. In particular, we will naturally want to presume that the average of a punctile function across a range falls within the closed convex hull of the function’s values on that range. Then for continuous f, we have that a range entirely infinitesimally close to x is one on which f’s output is constrained to a space entirely infinitesimally close to f(x), whose closed convex hull is entirely infinitesimally close to f(x), and thus the average value of f over the input range is infinitesimally close to f(x), and thus in the limit equal to f(x), as desired.

[Footnote: Incidentally, yes, I speak throughout using the language of infinitesimals, but this can all be converted to epsilontics or whatever you like in the standard way, if that makes you more comfortable. It all means the same thing as the end, interpreting infinitesimals via “nonstandard analysis”.]

How about the second Fundamental Theorem of Calculus? This tells us that the other direction of inversion works out as well: if F is an averaging-extensive function, then the average value across a range of the limiting value of F at each point of that range comes out to the same thing as the value of F on that range. In other words, turning the averaging-extensive function F into a punctile function (via limiting towards points), and then turning this back into an averaging-extensive function (via averaging over ranges), we return back to F. (This is ordinarily phrased as “\(\int \frac{d}{dx} F = F\) for \(F\) whose derivative is Riemann-integrable, or some such technical condition”, but let’s view it my way instead. Note: Because we’re using and presuming the existence of strong derivatives, the technical condition will drop away, which is great, because why should anyone have been bothered about it to begin with?)

To prove this, it suffices to note two facts: A) the process of turning averaging-extensive functions into punctile functions (by limiting towards points) is injective, and B) furthermore, the outputs of this process are always continuous. Why does this suffice? This is an instance of a very general fact: If we already know that x followed by y = identity, and we also know that y is injective, then we also know that y followed by x = identity. [Proof: y followed by x followed by y = y followed by identity = identity followed by y; now appealing to the injectivity of y, we get that y followed by x = identity]. So, we establish this second Fundamental Theorem from the first Fundamental Theorem, once we also see A) and B).

The observation of B) is straightforward enough: suppose f is the punctile function whose value at x is the limiting value of F on any range infinitesimally near x. In other words, the value of f at x is infinitesimally near the value of F on any range infinitesimally near x. Similarly, the value of f on x’ is infinitesimally near the value of F on any range infinitesimally near x’. But when x’ is infinitesimally near x, the ranges infinitesimally near x’ are also infinitesimally near x, and thus the value of f on x’ is infinitesimally near the value of f on x, making f continuous.

[Footnote: This result is the benefit of our “strong” derivative, as opposed to the “ordinary” derivative; alternatively, we could just add by fiat to our second Fundamental Theorem a technical presumption like that our original function is “continuously (ordinarily) differentiable”. Or similarly, we could define strong differentiation as given by continuous extensions of secant functions in two variables, so that strong derivatives are continuous by definition.

Note that the above argument requires a bit more care to be formalized as involving two different scales of infinitesimals in formal nonstandard analysis (a coarse scale of infinitesimals on the order of the difference between x and x’, and a finer scale of values which are infinitesimal even with respect to that), so that some instances of “infinitesimal” above refer to the coarser scale and some to the finer scale. But this can readily be done. And indeed, this really reveals properly the sense in which this argument would fail without using the strong derivative; we cannot simply consider the range which stretches from x to x’ above, as the relationship between f at x’ and F near x’ is only with respect to the finer scale of infinitesimals. We must consider a finely infinitesimal range near x’ that does not actually contain x.].

Now, we must note that the process of turning averaging-extensive functions into punctile functions (by taking the limit towards points) is injective. In fact, we will prove a stronger statement: suppose the difference between F and G’s limits at any point always fall within C, where C is some open convex set. Then so does the difference between F and G themselves, on any range. (Then by considering infinitesimal such C around 0, we get our desired result; indeed, more generally, every closed convex set is the intersection of the open convex sets containing it, granting us the same theorem with “open” replaced by “closed”). [In traditional language, this all shows that if the derivative of a function has bounded size, then the function has similarly bounded rate of change across any interval, and in particular, if the derivative of a function is constantly zero, the function is itself constant].

Proof of our stronger statement: Contrapositively, we shall show that if F and G have a difference outside of C on some range, then there is some point towards which the difference of their limits falls outside of C. By the linearity of everything, letting D be the difference of F and G, we are showing that if D ever assumes a value outside of C on some range, there is some point towards which the limit of D falls outside of C. To show this, take the starting range on which D takes a value outside of C and split it into subranges however you like. By convexity of averages, one of the subranges is such that D takes a value outside of C on this subrange as well. Continuing in this way, by repeated bisection (or whatever such kind of splitting you like), choosing at each stage an appropriate subrange to recurse with, we form a sequence of quickly shrinking nested intervals on which D takes values outside of C. The limiting point of these intervals will be our desired point.

Great. That’s it. We’re done now. This is how I think of the fundamental theorems of calculus.

Note: None of this presumes one-dimensionality; it all works just the same in as many dimensions of input and output as you like, suitably interpreted.

[This is unlike, say, the Mean Value Theorem (that a function’s average value across a range must match its derivative at some point in that range), which fails once the output is more than one-dimensional (consider uniformly rotating around a circle over time; average velocity is zero, though the instantaneous velocity is never zero). Proofs of the fundamental theorems of calculus are sometimes presented in such a way as to call upon this Mean Value Theorem, but they should not. So far as proving the Mean Value Theorem in the case of one-dimensional output, though, one way to do it is like so: suppose F has average value d over some range. We know, by our final lemma above, that since F’s average value on the whole range is >= d, there must be some point at which F’s derivative is >= d; symmetrically, there is some point at which F’s derivative is <= d. By continuity of the derivative, we also have the intermediate value (i.e., Darboux) property for the derivative, and thus there is some point at which the derivative matches d exactly.]

Great. That’s really it for now. This is probably unreadable to everyone else, and I may clean it up later, but it’s how Sridhar thinks.

Within the zoo of concepts that are used in calculus, we actually have the following:

We have the notion of an oriented region, and the signed boundary of an oriented region.

One dimensional regions are curves, and the signed boundary of an oriented curve is a pair of points, one treated as positive and one as negative. In a one-dimensional connected ambient space, this boundary map is an isomorphism: every pair of points has precisely one curve, up to reparametrization and considering movement forward and then back along a curve equivalent to no movement. In other contexts, this is not an isomorphism.

Given a punctile function F, we can turn it into a summing-extensive function which takes in a region and yields the total value of F at the signed-boundary of this region. In the familiar context, this sends F to the function which sends [a, b] to F(b) - F(a). In the familiar context, this map is surjective, and its failure to be injective is precisely that it sends constant functions to zero. Thus, we have an isomorphism between punctile functions up to additive constants and summing-extensive functions on endpoints of regions. If we want to choose canonical representatives of the quotienting of punctile functions up to additive constants, we can always pick a particular input point and demand it be sent to a particular output value. We can sometimes but not always also request for a particular output value at a particular limiting input point. Sometimes it is useful to say, for example, the limiting value at negative infinity is zero or the value at zero is zero. But generally, there is no great call to standardize representatives.

Given a summing-extensive function on a region, we can turn it into an averaging-extensive function, or vice versa, by dividing by or multiplying by the size of the region. This is a bijection, when we restrict attention to regions of positive size.

Given a punctile function f, we can turn it into an averaging-extensive function which takes in a region and yields the average value of f across the region. This map’s failure to be injective (when we restrict attentoin to positive size regions) is precisely that it sends measure-theoretically negligible functions to zero. It’s not quite surjective either. Under suitable conditions, though, we can pick canonical representatives, where every input point’s output is the infinitesimal average of the outputs near it; that is, we can take limits to canonically invert this averaging process. We do actually make a lot of use of these canonical representatives, when possible.

When the punctile function f is such that its average or total across any region R is the same as the average rate of change or total difference, respectively, of punctile function F across the boundary of R, then we say F is an integral of f, and could say f is a derivative of F (up to adding a negligible function; i.e., f is almost everywhere equal to F’ in standard terminology, or we could take the nonstandard position that this f still counts in itself as a derivative of F).

Ordinary differentiation is the composition of these actions to turn a punctile function into a summing extensive function (by taking differences), then into an averaging extensive function (dividing by region size), then into a punctile function (limiting towards points). Ordinary integration is the opposite steps: turning a punctile function into an averaging-extensive function (averaging across regions), then into a summing extensive function (multiplying by region size), then into a punctile function (undoing the taking of differences).

Thus, the fundamental theorems of calculus arise from the combination of multiple things being bijective: punctile functions (up to a negligible function) to/from extensive functions by averaging/limits, and punctile functions (up to a constant) to/from extensive functions by taking differences/undoing differences, in addition to summing-extensive and averaging-extensive functions being bijective by rescaling by region size. That the latter two of these are bijective is already available in discrete calculus (that is, the calculus of finite differences; that is, grade school adding and subtracting) and obvious, so it is the former of these that I fixate on as the non-elementary content of the fundamental theorems of calculus.

When working with curves in multiple dimensions, it’s perhaps less easy to think of a line integral of f(x) dx as a kind of average of f(x), because of the dependence on the direction of dx; our weights are vector-valued, with nontrivial direction, so we can’t think of them as all positive or as summing to a unitless 1. So, what does this amount to there? This must be a general failure when dealing with regions of lower dimension than the ambient space to be able to turn summing-extensive functions into averaging-extensive functions, because the “sizes” of regions can be multi-dimensional vectors, which we cannot divide by.