Second-order logic

Let us continue our exploration of the main kinds of formalisms for theories:

Second-order theories (or: theories of second-order logic) are theories with one type X, called the base type, and one or more others as follows, depending on the precise kind of second-order logic :

More generally from several base types we can consider the sets of all structures of any given kinds between them (numbers of arguments of each type, type of results). So the general idea of second-order theories is that they can express, for any types X,Y of objects, formulas of the form «For every subset of X...», and thus also (equivalently) of the form «For every function from X to Y...».

Second-order theories can be interpreted as first-order theories, where each type K of structures receives its role by one structure symbol of that kind with one additional argument of type K, as explained in previous sections. Indeed, currying this symbol gives a function from K to the intended set of all such structures (the powerset or exponentiation). The obstacle against the full formalization of second-order theories as first-order theories, is that we cannot generally express the bijectivity of this function by axioms: its injectivity can be easily expressed in first-order logic in the form of the below separation axiom, but its surjectivity can't.

Weak second-order theories, separation axioms

In a first-order theory, a type K will play the role of a set of structures of some kind with arity n over some other types (called base types), i.e. an arbitrary subset of the set of all structures of that kind, if we have The separation axiom means that its curried form with domain K is injective, so that it gives K the role of a set of operations or relations among base types. Then, the notation S(k, x1,...,xn) may be replaced by (x1,...,xn) ∈ k if a relation, or k(x1,...,xn) if an operation.
Theories with such types (sets of structures) will be called weak second-order theories.

Second-order structures

In a second-order theory, let us call first-order structures the structures (in the sense of first-order logic: operations and relations with finite arity) that are directly between base types.
Then the second-order structures are also structures like in first-order logic (operations or relations) but involving one or more second-order types (i.e. of structures between base types).

Let us focus on the case of a unary relation A on a second-order type Y. Like any unary relation, it can be represented by a subset K of Y. But then, forgetting about the type Y (with its idea that it contains «all the...»), to only keep the type K (with no such need of exhaustivity), what remains is a first-order theory, as expressed above with the separation axiom.
Of course we can only do it if the rest of Y is not involved in the theory. Usual interesting theories behave in this asymmetric way of only involving one side of a unary relation on some set Y of all structures of a given kind : one set K of structures, with no use of its complement (Y\K, corresponding to ¬A). So, most useful theories with a second-order structure, are just weak second-order theories. In other words, such a second-order structure is the reinterpretation of a type in a first-order theory that satisfies the separation axiom.

Such a type «plays the role of a structure» in the sense of how we described how a theory is modified by the addition of an extra symbol (structure) without any extra type :

In a theory, an additional type is constructed if it goes with enough extra structures (to ensure the uniqueness of the extension of an isomorphism to the new type) and enough extra axioms (to ensure that the extension, as if it was uniquely defined, is determined from base types up to isomorphism, so that the isomorphism on base types can be extended to it, cancelling the effect of the extra structures which may forbid the existence of any extension of the isomorphism if they are not «invariant»).
If the new type only has «enough extra structures», but may not have «enough extra axioms» so that each isomorphism between base types of 2 systems, has at most one extension (but maybe none) by a bijection between both versions of the new type, then the case is similar to that of adding a mere structure not specified by a definition (which may or may not be preserved by an isomorphism but does not split an isomorphism into several possible extensions). Then, we can say that the whole additional stuff plays the role of a structure.
In particular, the separation axiom in a weak second-order theory ensures that an extension is «structuring enough» in this sense (but does not otherwise uniquely determine the extension up to isomorphism). This is why an extension by a new type with separation axiom plays the role of a structure, justifying its qualification as a second-order structure.

Now, developments of theories by constructions, come with not only enough structures relating the new type to the old type(s) to let it play the role of a structure, but also «enough axioms» to «define» it.
In particular, the last kind of constructions we listed (of the set of all structures defined by an expression with all possible values of its parameters), is a particular set of (non-invariant first-order) structures, that is itself an invariant second-order structure (well-defined using that formula, and image of the set of all tuples of values of the parameters). Its property of «invariance by automorphisms» is expressed as follows : each element of this constructed type, is a first-order structure that automorphisms usually do not preserve (as this first-order structure is not invariant), but transform into other elements of that type (preserving this type as a whole).

Translating second-order theories into first-order ones

Provability was only found well-defined in first-order logic. To get a concept (rules, algorithm) of proof for a second-order theory, we need to translate it into a first-order theory, where the rules of first-order logic are applied. The problem is, first-order logic cannot fully express (determine) the concept of which is the true powerset of a given set.
But different ways to translate a second-order theory into first-order ones, can express weaker or stronger requirements on the interpretations of powersets, and thus eliminate more «wrong models» from consideration. We shall list 3 of them, from the weakest (that requires «few» subsets, and thus can prove «few» theorems) to the strongest (that requires «many» subsets and thus can prove «many» theorems); they will be applied to the case of arithmetic in the next section.

The first-order expression by a schema of axioms (weakest method)

This only applies to theories involving the second-order type, not in any structure, but only in axioms starting with a universal quantifier on it, i.e. «For all sets...» (or «For all operations...»). The idea is to interpret the variable structure as «only» ranging over all structures definable with parameters, by an expression in the first-order language that only uses the base type(s).
The method is to eliminate the second-order type in such an axiom, by replacing this axiom by a schema of axioms, each with a different defining expression to replace that variable.

The theory with stable power type (Henkin semantics)

Consider the case of a monadic second-order theory (for example; other cases are similar). We shall express it as a first-order theory with 2 types X, Y. In addition to the separation axiom, the idea that Y = P(X) will be expressed by a schema of axioms, called the comprehension axioms, expressing that all classes of elements of X (i.e. subsets of X defined by formulas with parameters in this theory, as in sections 1.5 and 1.7), belong to Y. They are written, for any formula F with at least one free variable of type X:

∀(parameters), ∃AY, ∀xX , xAF(x, parameters).

These axioms are essentially equivalent to those that defined the set-builder we introduced for set theory (see 1.8, 1.11), except that this symbol is not needed (it can be seen as an abbreviation).

In fact, all comprehension axioms without quantifiers on Y but only using bound variables in X and parameters in both X and Y, are satisfied by taking as Y the set of all classes of elements of X defined with parameters in X only. (What about those using another structure on Y in their defining formula ?)
So, what makes this construction more powerful than the previous one, apart from the ability to use variables of type Y in structures and other axioms, is the requirement to also admit in Y all subsets of X defined using quantifiers on variables of type Y. This goes beyond any concept of «definable subsets of X», and cannot even be qualified as a property of these subsets, since its use of Y in defining formulas makes this condition of definability circular.

Set theoretical interpretation (strongest method)

The expression of any theory inside set theory, initially introduced in 1.3 for first-order theories, can be extended to second-order theories, interpreting the notions of «all subsets» and «all operations» by the powerset and exponentiation operators of set theory. Then the rules of proof are those of set theory once expressed as a first-order theory (see 1.9, 1.10, 1.11).
This framework is the one expressing the intended meaning of second-order theories as deterministically involving the full set of all structures of each given kind between given sets: the tools of set theory ensure that this powerset will include all structures that «can be found»; they interpret it in a model of set theory which they implicitly regard as fixed (assuming that we are «living in it»). However, since set theory is itself a first-order theory, this formal appearance of uniquely determining these powersets is not real but simulated, as follows: from any bijection between two sets X and X' we can define the image in X' of any subset of X, which obliges P(X) and P(X') to conform to each other. Then, the view of diverse models of second-order theories as encompassed in a fixed model of set theory, obliges their second-order types to conform to each other when their base types are in bijection. This correspondence between all considered models gives the appearance of determination of second-order types from base types.
In this context, for a fixed X, any set Y with a relation between X and Y, can be seen as a relational system with language the set X of unary relation symbols. In the category of such systems but only accepting the embeddings as morphisms, the final objects are those Y which are copies of the "true powerset" of X.

This method is the most powerful approach, as the available variables and tools for the defining formulas that can find subsets of X to bring into P(X), are extended much beyond the mere sets X and P(X), up to whatever range of other sets that this model of set theory may contain. However this is still not any kind of specific «most powerful formalization» because it is itself not fixed, as it depends on the choice of an axiomatic system for set theory, while there actually cannot be such a thing as «the most powerful axiomatic system for set theory» (we can always find a stronger one).

Semantic completeness, logical incompleteness

A second-order theory will be called semantically complete if, from a set theoretical viewpoint, all the models that «correctly interpret the powerset» are isomorphic to each other.
According to the Löwenheim-Skolem theorem, first-order theories describing infinite systems cannot be semantically complete (first-order logic cannot prevent the coexistence of multiple non-isomorphic models that «incorrectly interpret the powerset», and thus would have different second-order properties if interpreted by the external «true powersets»).

A first-order theory will be called logically complete if all its closed formulas are logically decidable (provable or refutable). This is naturally meaningful for first-order theories where the Completeness Theorem makes the concept of provability unambiguous, but not for second-order theories where the rules of proof remain to be specified. However it is still a posteriori meaningful there, in the following sense.

Some semantically complete second-order theories (geometry, arithmetic with only addition and no multiplication) are already logically complete by the weakest method, as all models of this weakest first-order expression are elementarily equivalent (they have the same first-order expressible properties) though they are not isomorphic.
For other theories (full arithmetic, set theory), stronger methods will contribute to the provability of more formulas. However then, such a quest is usually endless: the logical incompleteness of these theories is fundamental in the sense that no fixed algorithm can ever suffice to decide all their formulas either (may this algorithm anyhow come from the rules of first-order logic, e.g. by a translation into some set theory, or not).

Higher-order theories

Higher-order theories generalize second-order theories, by having a list of types that can be enumerated starting from a given list of base types, and successively making other types by «second-order constructions» from previous types, i.e. constructions that may include not only products (as allowed by the constructions of first-order logic), but also powerset («Y is the set of all subsets of X») and exponentiation («Z is the set of all functions from X to Y»). Saying that they are successive constructions, means that there is an order between types such that each type of objects said to be made of «all the..» comes after those of which it is the powerset or exponentiation.
Higher-order theories can be translated into second-order theories as follows. Take as the list of base types, all types that are used in any construction of powerset or exponentiation. Then in the second-order theory over this list of base types, a good list of structures and first-order axioms will express all intended relations between base types. Namely, the relation Y = P(X) will be formalized by a structure of bijection between Y and P(X).

On the double meaning of "invariance"

Basically, the concept of "invariant structure" as "definable by a formula without parameters" is more restrictive than as "preserved by any automorphism".

Even a structure that is both invariant and definable with a parameter, is not necessarily definable without parameter. For example, in the theory of real numbers, not all real numbers are definable (since the set of possible definitions is in bijection with ℕ but the set of all real numbers is not), but Aut(ℝ, 0,1,+,⋅) = {Id}. The same for sets of integers in the theory of second-order arithmetic.
In either case, we have objects not definable by any finite formula without parameter, but still "definable with a parameter" (itself), and preserved by automorphisms. They can only be always "defined" by their infinite sequence of decimals, which is not a proper definition because it is an infinite amount of data.

There are still 2 possible ways to make both concepts of invariance somehow coincide :

Second-order arithmetic

This is the name of the first-order theory we described as "with stable power type" to describe arithmetic. However the list of basic structures needs to be specified.

The following 3 presentations of second-order arithmetic as first-order theories (with axioms lists implicitly determined by the previous explanations) are equivalent (definable from each other):

  1. With types ℕ and P(ℕ×ℕ) and with structures 0 and S ;
  2. With types ℕ and ℕ and with structures 0 and S ;
  3. With types ℕ and P(ℕ) and with structures 0, + and ⋅ .
Proof :
We get ℕ as the set of functional graphs (subset of P(ℕ×ℕ))
Using ℕ we can define sequences of integers by recursion ; addition and multiplications come as particular cases.
Using addition and multiplication we can define a bijection between ℕ and ℕ×ℕ by (x,y) ↦ x + (x + y)⋅(x + y+1)/2, so that P(ℕ) can be used in the role of P(ℕ×ℕ).

The need of the powerset

The Zermelo-Frankel axiomatic system of set theory not only accepts P (ℕ) as a set, but the whole infinite series P (P (.....P (N)..)) also. And it even goes further after this, to more infinite sequences of higher and higher powersets. In fact, the hierarchy of powersets that it requires of its universe, goes very far beyond any possible imagination or description.

Most of mathematics (with some exceptions such as geometry, which will be discussed later) uses one or both of

With some more work, the "ordinarily interesting" uses of quantifiers on P(ℝ) (thus indirectly of ∀ XP(ℕ)) and other uses of P(P(ℕ)) may usually be encoded as elaborate uses of P(ℕ).
In fact, most of ordinary mathematics can be indirectly developed using only ℕ and P(ℕ), that is, in the theory of second-order arithmetic. Further uses of the powerset (the P(P(P(ℕ))) and more) usually appear useless in ordinary mathematics. Their main uses may be the following :
Still, while quite less necessary, the powersets of uncountable infinite sets may be also considered for convenience. It is just formally simpler in set theory to assume the powerset tool to apply to all sets, as it is involved in some important concepts (descriptions of sets and systems).
Ultimately, the precise initial axiomatic system of set theory that will suffice to comfortably develop all mathematics (a simple but powerful enough one), will consist of the components (symbols and axioms) that we mentioned up to the powerset, plus the axiom of infinity, which equivalently states the existence of ℕ or of some infinite set (these conditions will be precisely formalized later, when needed).

Second-order arithmetic as a possible foundation for mathematics

Most ordinary mathematics and all physics can be expressed in the theory of second order arithmetic expressed above.
The canonical bijection P(ℕ×ℕ)≅ P(ℕ), allows to define recursive sequences in P(ℕ).
But other encodings can be much more cumbersome (unnatural), so that this theoretical possibility to encode all ordinary mathematics into second-order arithmetic, while possible, is rather far-fetched and not practical as a way to study mathematics.

For example, arbitrary continuous function f from ℝ to ℝ can be represented in P(ℕ×ℕ×ℕ×ℕ), and thus in P(ℕ), by the set of 4-tuples of strictly positive natural numbers a,b,c,d such that f(a/bb/a) > c/dd/c.

Or, as another example of possible encoding, such that f(ab2) > cd2.

Another inconvenient is that, in mathematics, many objects studied by a diversity of useful theories, do no aim to be anything else than pure elements. Choices to build mathematics on a theory where all objects are either a natural number or a set of natural numbers, while being "right" in the fact that every set of interest has an injection into P(ℕ), obscures this purpose.

Set Theory and Foundations of Mathematics