Structures of mathematical ontology from relativism to growing block time

Abstract :
Some of the most philosophically relevant facts of mathematical logic are gathered and popularized, sometimes humorously, into a coherent philosophical vision of an open-ended mathematical reality: from the relativity of semantics, through the Lowenheim-Skolem and Completeness theorems, to the systems of second order arithmetic and set theory with their ontological justifications. Some seemingly different theses, including relativist and potentialist concepts, can be read as different aspects, approximations or presentations of a common general understanding. Ordinal analysis describes the growing block time structure of mathematical ontology, while power sets admit a relativist interpretation.

This article is still incomplete and under development.

Introduction

The literature on mathematical ontology (I mean, the ontology of mathematical entities) appears filled with a diversity of competing views. Letting aside non-realistic views which essentially deny the very existence of this debate topic, diverse options of mathematical realism appeared on the market.

A first approach to mathematical ontology is given by the famous iterative conception of set. It can be praised as a good introduction and approximation indeed, but has shortcomings : roughly speaking, it works well for hereditarily finite sets, but becomes unsatisfactory in details to account for the existence of infinite sets, whose delimitation (from any range of objects whose existence is admitted at some "time") beg for definitions; this default view amounts to admitting the power set axiom, which is questionable.

Hence the need of more specific or elaborate views. Some views, sticking to this conception regardless, are versions of set theoretical realism, also called universism, meaning there would be a single correct ideal universe (i.e. model of set theory) ; this leaves open the question of its exact properties in the endless range of statements left undecidable by our officially accepted axioms.

Other views recognize a range of multiple legitimate universes, forming a multiverse, across which the facts may be or not be expressed in modal formalism ; but multiple versions of such a multiverse have been proposed, where universes may vary by height, by width or both, and may be strongly or weakly linked together.

The main account of height potentialism I could find was [Linnebo, Ø. (2013). The potential hierarchy of sets.], which has the merit of proposing a potentialist justification of the axiom schema of replacement; but this exposition remains unsatisfactory, as pointed out by ["Why Is the Universe of Sets Not a Set?", Zeynep Soysal] (which unfortunately ends up throwing the baby with the bathwater). Without precisely analyzing their presentation and arguments, we shall offer a justification of this schema along roughly the same idea, but with different details. Other accounts include (...)

Such a state of profusion of competing thesis with no clear tools or reasonable hope to decide which one is "true" may be usual in philosophy, but is much less common in hard sciences. At least, most scientific controversies only last for a while before getting resolved by clear evidence (while mathematical conjectures, which may wait for long without solutions, still have clear and widely agreed criteria of evidence, so that any resolution remains a matter of fact, namely of providing such evidence, rather than a matter of taste). Persisting controversies of a philosophical kind remain an anomaly in science. As explained in my previous article, the other famous case of this kind, that is the measurement problem in quantum mechanics, should not have occurred in this way since a clearly best solution is actually there and is only kept ignored by a kind of sociological accident.

The present article will argue for a similar conclusion in the case of mathematical ontology, but for a very different reason : that behind visible discrepancies, the main candidates should rather be seen as genuine aspects of a single necessary, unified view (as happened in physics, most famously on whether light is a wave or a particle). This follows the famous allegory of the blind men and an elephant, yet without necessary blindness of all participants, as it may be a kind of joke or light writing style for some authors to offer seemingly different views and present them as diverging from each other, especially when it is seen as normal for diverse such views to be developed by the same authors. Anyway, we shall care to clarify the means of unifying diverse perspectives into a coherent general understanding.

The precise range of the main valid views (aspects of the big picture) may not be perfectly reflected yet by the list of most often mentioned views in the current literature on this topic; but, beyond the lack of strict division to be made between right and wrong foundational theories or philosophies of mathematical ontology, the further away a theory or view departs from the unified understanding, the more objectively defective it is likely to be found as a candidate foundation for mathematics. Of course, if anyone expected the ultimate picture of mathematical ontology to be a static one expressed by a fixed formal system, then the question of specifying such a system would admit no right ultimate answer, because the correct view is rather a dynamical one escaping any fixed formalization.

We shall explain the features of its dynamism by reviewing the mathematical facts, with the deep articulations they provide between superficially divergent but valid perspectives. Beyond the iterative conception of set, a more robust version of the role of ordinals to map the ontological order of mathematical infinities, is given by ordinal analysis (measure of the strength of theories), which takes some work to introduce (familiarity with ordinals, including the Cantor normal form, will be presumed in this exposition). We shall see that "width potentialism" and "height potentialism", describing the expansion of the set theoretical universe, naturally go together, with no need to formally postulate this as if it was an arbitrary assumption; the proper view of the width dynamism transcends the one operated by forcing, which is a relatively limited move. Comparing formal theories, an important focus will be the unity of ontological meaning between set theory and second-order arithmetic, as both can be strengthened by similar large cardinal axioms behind the surface of their huge literal divergence.

Let us first zoom out by splitting the relevant general topics into the following list, ordered from the least to the most mathematical ones:

(a) General ontology;
(b) The precise status of mathematical ontology relatively to general ontology;
(c) The internal structures of mathematical ontology
(d) Purely mathematical works and results from relevant formal theories of mathematical foundations.

(a) was the focus of my previous article with detailed description of the nature of conscious time, taking inspiration from the time structure of mathematical ontology.

(b) would be rigorously unnecessary to address (c), except that some philosophers may be hindered in their approach to (c) by unfortunate preconceptions or unresolved issues about (b). A short reply to this possible obstacle could be to refer to the view of William Tait, which may fit as a first approximation, despite its shortcomings. But to be much more precise in some ways, the first section here will offer without argument some possibly original answers to this topic (b) which seemed to make some philosophers busy, in complement to my previous work on (a). These may actually be of some use as a preliminary for (c), by altogether justifying the relevance of some kinds of metaphors which will be used in the expression of (c), and including implicit disclaimers against over-interpretations of these.

Then, the rest of sections will focus on (c), largely based on an extensive review of diverse facts of mathematical logic (d) which are essentially already well known to specialists, though many of these are often neglected by the spotlight of usual courses and philosophical accounts. The general method will be to let answers to (c) naturally emerge by intuition when "reading in between the lines" of relevant facts of (d) and their many connections. It will thus mainly be a work of popularization of mathematical logic, gathering and describing the necessary facts in a coherent philosophical order, forming a clear, dynamical and somewhat encompassing picture of mathematical ontology. This being intended as a work of popularization of mathematics rather than one of strictly mathematical research or lecture notes (and as I may lack technical expertise in some reviewed concepts), it will skip many of the technical details and proofs which usually fill other works on the topics, but which are not crucial to the philosophical understanding ; and unfortunately cannot claim full accuracy or completeness on all possibly relevant details at this stage. Thus, the point is rather to serve as a pioneer way to sketch the big picture of this vast topic; and I welcome any feedback for correction in case of any inaccuracy.

1. The status of mathematical ontology

The thesis of general ontology expressed and defended in the previous article can be summed up by these claims:

Let us now address the following question which was left unanswered there : how can the possibility of such links between both substances, be theoretically reconciled with the claim that they are indeed two distinct substances, with effectively separate ontologies (concepts and flows of existence) ?

This question, and the usual lack of explicit answer to it, is often presented by philosophers as an objection against mathematical Platonism. While it may be dismissed as a pseudo-problem, let us be somewhat explicit in answering it, accounting for how both substances effectively differ, and still remain connected in these familiar ways. The perception of this question as a problem and even as a possible objection against mathematical Platonism, as well as against substance dualism, may have taken strength from physicalist preconceptions, namely the temptation of conceiving any two realities or substances according to the idea of two physical objects coexisting in a common physical space, and which would interact inside it by a kind of physical interaction. One big defect of this physical interaction model, is its failure to account for the asymmetry of the link: the transcendence of consciousness over mathematics, by which the mind can understand mathematics but mathematics cannot describe the mind.

Dismissing the physical interaction model dispenses us of searching for an interaction process in guise of account for the link, but opens the question of offering a better conception instead. It is likely impossible to effectively express this link in perfectly accurate terms, for precise reasons coming from the nature of the topic as will be explained below; but we shall offer candidate approximations of how it works.

Let us start by a topological metaphor. The idea is to figure the mind as an open set in a topological space, and to figure the mathematical stuff as the boundary of this set. This metaphor succeeds in combining the following features which might have been felt as incompatible, therefore illustrating that such a feeling of incompatibility was unmotivated:

It also interestingly reflects further features of the link :

One may even refer to a property of measure theory as a metaphor of how these two last features happen to be combined: that an infinite union of sets with measure zero also has measure zero. So, since the existence of any finite mathematical set has no cost for consciousness, an "action" of creating an infinite set, as a limit case of creations of finite mathematical sets, is also costless, at least for a concept of "cost" which pertains to consciousness. 

For more accuracy we need to correct the above claim of empty intersection, which has an important exception in a sense: the mind can also be said to actually contain finite mathematical systems (some of these in actuality, the rest in potentiality); it is just not reducible to these (the mathematical concept of cardinality is inapplicable to conscious stuff considered in its depth rather than for some specific mathematical limit aspect). Moreover, to say that the mind contains finite mathematical systems is just a way of speaking, since these mathematical systems are only aspects, limits of what is in the mind, and not rigorously of the same nature as any objects actually in the mind (the intersection of their natures must indeed be empty since mathematical objects already have empty nature, by definition of what is a mathematical object).

This exception is actually crucial, since among finite mathematical systems there is the crucial case of formal systems, that is the syntactic side of the dipole (syntax/semantics) of mathematical ontology: formal systems constitute theories which can describe any other mathematical systems (or at least those we can study by our somehow finite means, namely excluding the use of any "syntax" with infinitary operations or formulas with infinite size). Now, the mathematicians abilities to explore mathematics are largely reflected by the mathematical features of finitary formal systems, whose rules and dynamics (of formal definitions and proofs) are a reduced form or subsystem of these abilities, playing similar descriptive roles.

In such cases of similarity between features of conscious and mathematical ontologies, the mathematical version is a limit case of the conscious version, namely where the indiscible depth of qualia is removed or ignored; let us also call it the zombification of a concept or feature of ontology. This zombification may be a trap or a tool, depending on how it is handled.

Authors trying to make some physical or mathematical theories of the mind under some currently more widespread metaphysical preconception, will probably try to offer some sketch of descriptions which aim to become as formally precise as possible; but regardless to which degree they themselves happen to make their descriptions precise, the attitude of raising some kind of formal accuracy as the target horizon of their research actually implies the ultimate zombification of its object, i.e. what they will effectively describe cannot be minds anymore, no matter whether they like to admit it or not. The trap, then, would be to ignore this substantial difference, with the illusion that they are describing minds when they actually aren't.

On the other hand, the proper use of zombification as a positive tool of ontological research, as I see it, proceeds as follows. It consists in developing mathematical concepts, or concepts about mathematics, somewhat similar to some aspects of the mind, in a way open to intuitions from aspects of consciousness beyond the resources of given formal means, but keeping the focus on trying this way to develop understandings of mathematics with its own ontology, instead of attempting to specifically represent and analyze consciousness. The reason is the coexistence of both similarities and dissimilarities between substances, which needs to be accepted; and that the discovery of hidden similarities can eventually only come after the acceptance of dissimilarities, rather than against these. As an unavoidable effect of any detailed rational analysis, zombification needs to be accepted and recognized rather than denied or forcefully avoided, since attempts to avoid it by trying to force similarity by some mathematical postulation of features seemingly similar to aspects of the mind would be vain. Pushing zombification somehow to its endpoint, the mathematical nature of its results permits a rational, rigorous, objective and successful in-depth analysis which may not have been possible on the conscious side, due to the possibly ineffable character of introspection which examinations of the mind for itself would need to involve. But, for it to reach such a worthy destination, any artificial constraint of trying to keep superficial similarities should be replaced by the more natural requirement of confidently and liberally letting the math speak for itself: listening to the most intrinsic and universal necessities of mathematical ontology beyond the limits of any prior familiarity or any arbitrary postulates about how ontology should be working. This is required for the consistency of this picture of parallel substances : the idea that, behind the possibly contingent aspects of its currently familiar form, mind is a primitive substance as much as mathematics is, so that its deeper foundations to be discovered remain mind-like with just purer features of fundamentality, universality and necessity, and can only be compared with similarly deeper, universal and necessary aspects of the foundations of mathematics. So, the deep similarities between mathematical and conscious ontology can only be found by digging into what can be the most fundamental, universal and necessary features of mathematical ontology, without another prejudice or agenda such as contrived research of superficially similar but non-fundamental arbitrary mathematical constructs.

Some works such as Edward Ferrier's "Against the Iterative Conception of Set" may claim to object to the picture of time flow in set theory by projecting preconceptions about how the flow of time should work based on our specific conscious experience of it, to complain that it does not match set theory well. The proper solution of course is to abstain from assuming such a fully pre-packaged conception of time, and paying more attention to how set theory and other foundational theories really and naturally work so as to draw the correct picture of the mathematical time flow (which may not always coincide with the conscious one) out of it.

As we shall see, the scope of possible zombification is wider than often assumed, as some key features from conscious ontology, often presumed to be specific to the mind, actually persist in some form in the mathematical limit, such as subjectivity, (partial) self-reference, and time passage.

In this picture we can notice the rather central position of common language, which happens to be at a crossroad relating all components at stakes: it is a partially zombified expression of conscious understanding, somewhere on the way of formalization since it admits numerical encodings, yet it is not completely formal, but flexible enough in its rules of expression and intended semantics to be able to express rather well all thoughts we seemingly need to express. We shall of course keep using it as the main and only adequate tool to communicate much of the concepts at stakes in the topic of the present work, namely the ontological issues of pure mathematics. This topic may be the main case of our possible conceptual abilities that combine the qualities of being entirely focused on purely mathematical objects, and still not themselves completely zombifiable concepts. As such, unlike any fixed axiomatic theory, it would be absurd to be expect such expressions to be always properly readable in a completely literal manner.

We shall thus sometimes benefit the crucial input of metaphorical modes of expression, whose actually intended meanings can only be brought to life by each reader for oneself, due to the impossibility of complete formalization. This will include referring to our ordinary intuitions of time and other ontological concepts, and yet in doing so, departing from some of the usual but specific features of conscious or physical ontology, which will be completely ignored from our discussion, to directly focus descriptions in conformity to the features of mathematical ontology, which emerge from expert understanding of mathematical logic. Thus, such expressions may sometimes surprise non-expert readers of this article at first sight, but will hopefully look more acceptable after some serious effort to follow this work of popularization of mathematical logic.

Introduction to the strength hierarchy

Mathematics is split into an endless ion of theories, each representing a particular context, i.e. a perspective or range of perspectives, which one considers to work in. A choice of theory is a way of choosing the objects being studied (or more precisely which objects are seen as primitive while others may then be considered indirectly by construction), and their status (the accepted ways to operate on them). In a sense, a choice of theory is a way of trying to stop time from flowing. Of course it would be impossible to stop time, but on the other hand we can never absolutely escape time: any perspective must somehow be that of some specific time, so that whatever is undertaken at a time may be shot and formalized as it is happening; in particular, any attempt to describe the flow of time itself can anyway be seen as restricted to one or another specific limited time interval, while another reflective move over this description of time, results in reaching still further. Several philosophical works on mathematical ontology ran into imaginary difficulties by considering some specific theory as a candidate ultimate one, representing the end of time, and complaining that it cannot account for the meaning of all its own expressions or axioms, i.e. it cannot reflect over itself. Of course no theory can ever fully account for itself, as time cannot really stop.

So, there is no ultimate theory able to describe everything, but the coexistence and succession of multiple perspectives can be tracked by the articulations between diverse theories, which may give different statuses to the same objects or symbols. On the one hand, one may consider arbitrary theories, i.e. specialized in the description of specific systems; on the other hand, there is a main thread of foundational theories, which form the main time line of mathematical ontology: the "weak" ones describe the shorter realities (which come first), while the "strong" ones describe the later, more expanded realities.

The diversity of foundational theories may take different formal aspects: in principle, these theories may differ by their language (lists of primitive types and symbols), their axioms, and their logic (the deduction rules used to derive consequences from axioms). In practice, hardly any role is played by the choice of logic, usually reduced to only one possibility (first-order logic); the possibly suitable languages are quite few (roughly, the weakest theories use the language of first-order arithmetic ; the stronger ones rather use those of second-order arithmetic, higher-order arithmetic or set theory, but some overlap is possible in their range of use). So, the bulk of variations in strength consists in the diverse ranges of accepted axioms, where strength is added by adding more axioms, among formulas which were left undecidable by weaker theories. Metaphorically, strength is a measure of knowledge, which expands in time by the virtue of each successive act of reflection.

We shall review all this in details.

Objects and symbols

Before entering any complex mathematical theory to painfully try to chase there the shadow of the time flow of mathematical ontology, let us start by directly staring into its eyes in its bare form: its way of already ordering the simplest actions, which precede any choice of a specific theory (so to say, actions still weaker than the weakest theory), and thus any use of logic (whose operation is relative to a choice of theory), usually overlooked due to their apparent triviality causing them to fail giving any job to researchers, but which can yet crucially serve to give birth to the core intuition which then provides the thread of understanding of the whole story.

All starts by assuming that there are some objects, and undertaking to introduce symbols to name them. It matters little whether this existence assumption is "really" right or wrong: what matters instead is the action of doing that, which is largely self-sufficient. Indeed, symbols are themselves objects in their own right, so that

Moreover, any formal description work once given can then be reflected upon and formally described by introducing additional symbols to name any previously unnamed object. For example, given an object X, the name "X" we gave it, is another object we can name "S". Still, we also implicitly meant to link both, namely we undertook using S to name X, and this link is itself a third object which may be formalized as an ordered pair (S,X), or (to let this be generalized to having multiple symbols naming multiple objects), as the function from {S} to {X}. Now, X and S need to exist first before this link (S,X) of interpretation between them can be built.

This posteriority of the link between objects and symbols, with respect to these objects and symbols themselves, plays a more effective role in the construction of interpretations of mathematical expressions : this proceeds step by step from the constant and variable symbols occuring there, to the operations over them, to the more external sub-expressions of the given expression.

Variables and sets

As symbols are not a priori coming with their semantics attached, this leaves open the question of the possible existence and uniqueness of that semantics. Some given symbols may admit multiple possible interpretations, as a matter of perspective. Namely, a symbol once seen as a constant naming one object, may also be seen as naming another object (if its value is not fully determined by the given axiomatic or other context); therefore, still another perspective consists in recognizing the plurality of its possible values, which means seeing it as a variable rather than a constant. Each such perspective is called a set : this is a particular range of values which are considered available (possible) for a given symbol in a given context, and essentially sums up itself such a context of possibilities for the concerns of some given variable symbols.

But, this raises the question whether this more encompassing perspective is itself unique, or how open to variations it may be. Namely, just as any view of a given symbol as a constant expresses a specific perspective which may often be overstepped by a wider perspective seeing it as a variable, any specific view of a given symbol as a variable ranging over a fixed set may often be overstepped by a wider perspective seeing this set itself as able to vary.

A choice of a theory involves first a choice of the kinds of variable symbols which are admitted in its language, and a concept of the sets of objects which each kind of variables is intended to range in. Two main kinds of theories need to be distinguished:

This distinction may seem to disappear from a strictly formal viewpoint.
Indeed, the range of the second kind of variables, that is "the range of all sets which a given set theory may happen to name", can be formalized in the same way as those of the first kind (sets), except only that it would be another sort (say, using another alphabet to name those variables). Then, a logical grammar with variable sets (where some variables may have variable ranges) can be reduced to one without, by letting the role of variables with variable ranges be formally played by variables basically ranging in the union of these ranges, then restricted from there to specific ones by conditions inserted in formulas. By these means, set theories are made to fit the framework of first-order logic.

This reduction to first-order logic (almost) perfectly works in the case of finite set theories, which can also be expressed in the form of theories of bounded arithmetic. Indeed, any finite set, made of some finite number n of objects, may be represented through a given choice of enumeration as the set An of all numbers x<n (or some other set of numbers all lower than some other number defined as the cardinal of the union of some finite list of finite sets which are involved by a theory at a given time). So, in any theory considering just finite sets as its sets of basic objects, if the role of a set of n elements is played by the number n (so, if such sets can be suitably represented by some An), then the role of the range of all finite sets possibly considered by that theory (or anyway a range containing them all) is played by the fixed set ℕ of all natural numbers (to be used separately from the finite sets, i.e. naming numbers by a different kind of variable than those naming the basic objects, because ℕ is infinite).

(Otherwise we still let ℕ represent the range of finite sets by encoding any finite subset of ℕ by a natural number using the BIT predicate).

But this comes to work somehow imperfectly with theories of infinite sets, as we shall explain now.

Classes of infinite sets

For most set theories meant to describe infinite sets, the above described reduction to first-order logic still works on the syntactic side, but loses track of the relevant semantics.

Indeed, according to the normal semantics of first-order logic, each given theory admits particular interpretations, that may not be an absolutely unique possible interpretation, but yet treated as a fixed one for the sake of the internal works of the whole given theory; and each such interpretation regards each variable symbol of the theory as ranging over a fixed set by the basic use of quantifiers (before being practically restricted to its subsets according to the formulas).

But many set theories are actually meant to be interpreted in the following quite different way. They are meant to admit neither a single nor multiple acceptable interpretations of their "class" of sets as a fixed set, but to admit a kind of interpretation in which that class of sets is not thought of as a fixed set.

Indeed, whenever a range of a variable is a set, such set theories mean to formally admit it as such; and their class of sets will always exceed it, since they recognize that no particular set can ever be the ultimate one containing every possible set.

So, the semantical point of the concept of class, by contrast to that of set, is to name a kind of range of objects of a theory (more precisely: the syntactic feature of a theory having a kind of variables formally referring to a range of possible objects), without giving it any supposedly fixed semantics as a set. It means treating variable symbols as able to take some values and even some ranges of possible values but with no claim to ever reach a point of exhaustion of these possibilities.

Now, classes may admit candidate interpretations as sets, which may form valid interpretations of set theories in the sense of first-order logic. These interpretations can serve as practical views encompassing those values which may be found along, and relevant for, a given range of works. Yet they remain somehow semantically unsatisfactory, as each of them may be later overstepped by some new, "better" ones as broader sets, including further elements yet to be created.

Potential vs actual infinity

As noted by [Linnebo & Shapiro : Potentiality and indeterminacy in mathematics], a parallel can be done between Aristotle's finitism, according to which the infinity of ℕ only exists in a potential sense (of possibility to consider ever larger finite sets), and the potential nature of the universe of sets in modern set theories, which admit the infinity of ℕ as actual.

Now an aspect which may have been overlooked, is the questionable definiteness of the presumed opposition of characters: potential vs actual. This is essentially the opposition of characters, for some given event, of being future vs being past. It is really not an intrinsic property of that event, but a property of the perspective taken over it. It is just a question about what time it is, which cannot admit a fixed answer.

Something appears in potential (i.e. as future) by some successive reflective moves which will end up creating it ; then a reflection over such moves results in conceiving them in their generality, and turns the whole sequence of such moves into a present actuality ; then another reflection over this generality results in formalizing it as a theory, whose semantics effectively amounts to viewing that sequence as past.

Of course, a kind of divergence occurs here between syntax and semantics : the syntax which describes the generality remains a finite one, while its semantics is infinite. To make this infinity actual, requires to effectively proceed the infinite sequence of actions following the described rules, and such endless processes may eventually give some results which could not be predicted from any fixed theory in smaller amounts of operations.

But while the potential infinity of the range of all finite sets can theoretically become actual once for all by actualizing ℕ as a set to mark the end of all possible finite ages, it is not possible to complete the actualization of the potential class of all sets, because there can be no absolute end of all possible times.

The kinds of sets

As a first approach to the content of the strength hierarchy, i.e. the main time line of mathematical ontology, one may roughly sum it up as a succession of 4 main stages, according to the kind of variation ranges (from the "smaller" to the "larger" ones) which are recognized, or better, qualified as "sets", i.e. on which quantifiers (∀ and ∃) are considered meaningful (a condition which may be somehow seen as optional and thus as splitting the latter 3 stages); it is just a matter of reflection (or abstraction) to move from each kind to the next:

Hence a succession of theories aimed to express the recognition of these successive kinds of ranges as sets:

Concepts of truth and open quantifiers

This rough measure of the strength of theories by the kind of ranges which are recognized as sets on which quantifiers can be applied, can be refined as the measure of the class (successively larger kinds) of formulas which are accepted by a theory as meaningful (true or false). Hence the need to clarify the concept of truth. Philosophers may have been very confused on the topic of truth, with a lot of competing thesis. Anyway, outside mathematics things are really a mess, making such debates really hopeless there. Now in mathematics, the right conception of truth is somewhat non-trivial, yet rather necessary, leaving no room either for mystery or hopeless controversy.

Basically, the truth or falsity of a formula is the result of its interpretation in some supposedly fixed system of objects, and thus relative to the choice of that system. This choice may be explicit (expressed by variables in a given theory) and/or implicit (a matter of interpreting a theory as a whole), as a matter of perspective. Implicit choices may be made explicit by reflection (i.e. time passage), yet such explicitation works may remain never fully complete.

Yet, some other kinds of formulas, especially those using open quantifiers, i.e. quantifiers over a class instead of a set, are meant to follow a different concept, despite the lack of syntactic difference. The difference may be explained as a matter of tense : the basic concept above (by an effectively done interpretation), may be qualified as assuming a past tense for its objects, while the alternative concept admits a future tense. It would be dubious to try formalizing this difference, both because it is a mere difference of semantics, which rigorously cannot be contained by syntax anyway, and because, again, the question whether a given object is past or future is a matter of what is the current time with respect to this object, which cannot have a fixed answer.

Unlike past propositions, future propositions are actually not meant to take a regularly clear semantics, because the computation process which would produce their value is not assumed to be available. Rather, such propositions can be understood as either imperative (when chosen as axioms) or predictive (when obtained as theorems). Thus, in most contexts, only some of the possibly expressible future propositions are taken as true or false, while other propositions of the same or even lower complexity may be still seen as undetermined. The progress of strength of theories, that is the time passage of mathematical ontology, may be measured by the increasing range of expressible propositions whose semantics is held as past.

One may expect further details on the concept of truth, more precisely in the past tense case, and ask whether the deflationist conception of truth works. Yet, whatever candidate formal definition of truth (semantics) may be given, it can only provide a semantics insofar as it is itself interpreted, which means that it essentially leaves unchanged the issue of the need to externally provide a semantics in a chosen world of objects outside the syntax; the only question, then, is whether some definitions can provide non-essential but still useful contributions, and indeed there are, as follows.

The "deflationist conception" of truth (also called disquotation), which effectively means standing still (trivially defining truth by itself with no further explanation), is indeed obviously valid in the precise case when the formula to be interpreted is effectively given, fixed and explicitly written down before our eyes.

But in the rest of cases, namely when a formula is presented in some other way, either by its name (a constant or variable symbol) or some expression which begs to be interpreted as a finite mathematical object encoding the formula to be interpreted, deflationism is obviously inapplicable in a strict sense (but can nevertheless be abusively admitted for practical reasons: giving the name or another not fully spelled out designation of a formula to mean its truth). What is needed instead, is a predicate T depending on a variable F ranging over a certain class of formulas, to interpret the truth in that class. The point is to effectively sum up the interpretation process for a whole range of possible formulas F, into that for a single formula T only depending on a parameter F. So, a truth predicate plays the same role with respect to some general range of formulas, as the concept of universal Turing machine plays with respect to the general concept of Turing machines : the role of emulating hardware by software.

Now a possible view of time (strength) consists in the successively larger and larger classes of formulas which are considered definite, and for which the required definition of the truth predicate is more and more "complex" in some sense.

Truth undefinability

A natural question, then, is to compare the complexity of a truth formula Tr with that of the formulas it aimed to interpret, namely, to specify whether, for any (specific, spelled out) F in the class of formulas interpretable by Tr, the formula Tr(F) also belongs to that class.

An obvious requirement for the formula Tr is that it must be able to handle a parameter designating a formula from a certain range, and there is a minimum strength needed for this. But already with this strength (or so ?) the liar paradox is applicable to tell more.
Indeed, from the simplest ability of a theory to handle formulas, one can derive the ability for a formula to essentially quote itself (this quotation included): for any formula F with one variable in a range of formulas, one can write a term Q provably equal (in value) to the quote of F(Q)
Or almost: more honestly, what can be done is, rather simply, write a formula G such that, provably in the weakest theory, G(X) is equivalent to X being the quote of (∃Y, G(Y)∧F(Y)).

Then, if we could take as F the negation of Tr (denoted ¬T), the value of Q would be the formula ¬Tr(Q), that is a formula claiming its own falsity.
Therefore, either T fails to describe truth in its intended range, or it is a category mistake, i.e. the liar paradox was trying to offer to the variable X of Tr(X) a value outside the class of formulas which T could handle. Conclusion : T cannot interpret formulas of the form "¬Tr(something)".

A generalized version of the truth undefinability theorem, proven in the same way, says this:

For any class C of formulas F(x) with one free variable x in some range X,
For any formula Φ(F,x,y) with 3 free variables F,x,y, where F ranges in C, and both x, y range in X,
If there exists y in X such that (∀x∈X, F(x) ⇔ Φ(F,x,y)) for every explicit F in C,
then the class C cannot contain the formula ¬Φ(F,x,x) with one free variable x, while F is replaced in Φ by any arbitrary explicit choice (which presumably does not affect the class of Φ).

(precisely, if C contains ¬Φ(F,x,x) then one can build from Φ a formula F in C for which one can prove (∀y∈X, F(y) ⇔ ¬Φ(F,y,y)) )

As we shall see, most classes of formulas in the strength hierarchy turn out to be each interpretable by a truth predicate which does belong to the same class (the main exceptions are the weakest classes and the Delta classes). Necessarily therefore, these classes are unstable by negation : the negations of formulas from one such class cannot all also belong to that class.

Hierarchy of first-order formulas

Let us now describe and classify the formulas which can be written in any first-order theory with unspecified kinds of variables, thus applicable to the description of any specific system, before applying this to the foundational theories in the main line of strengths with the particular kinds of sets listed above. For convenience to handle the general case of an unspecified theory, we shall name the domains of variables (which are fixed and specific to given theories) using set theoretical notations, and abusively use more set theoretical tools meant to abbreviate some possible works with first-order theories. To not have to cope with exceptions, all considered sets are assumed to be non-empty.

Let us describe the structure of arbitrary formulas by listing the successive kinds of subexpressions they are typically made of, in the computation order of their interpretation: from the innermost subexpressions (which take values first), to the outermost (ending the computation):

Indeed, the rest of formulas which may be written, are provably equivalent to formulas of class Σn or Πn as just described, by simple reduction rules described below. We may extend the definitions of classes Σn or Πn to include in each class all formulas provably equivalent to some formula which fits the said format, though the resulting added complexity of such definitions may raise classification issues for the weakest theories trying to describe them.
A formula is said to be of class ∆n if it is both equivalent to a formula of class Σn and to another formula of class Πn. Depending on context, one may either need this equivalence to be provable, or simply true for all values of a given variable in the current interpretation.
These classes are related as follows: for all n,

n = Σn ∩ Πn
F∈Σn (¬F)∈Πn
Σn ∪ Πn ⊂ ∆n+1

Indeed each class Σn includes all classes Σk and Πk for k<n, and the same for Πn, because one can always re-write a formula with useless additional quantifiers.

Let us explain the reduction of other formulas into the said classes:

While it implicitly follows from the above, we can explicitly see that if F, G are ∆n+1 then
(∀x∈X, F(x)) ∧ (∃y∈Y, G(y)) is ∆n+2 as it is equivalent to both

∀x∈X, ∃y∈Y, F(x) ∧ G(y)
∃y∈Y, ∀x∈X, F(x) ∧ G(y)

Second-order formulas 

In addition to quantifiers over some sets called base sets, we may also need to use quantifiers whose range is thought of as that of "all possible structures" of a given kind over some base sets. Possible kinds of structures are the n-ary relations and the n-ary operations, for some small explicit n:

and so on.
Let us consider formulas admitting two kinds of quantifiers, according to their range:

A formula is said to be first-order if it only involves first-order quantifiers; classes of first-order formulas are denoted Σ0n or Π0n or ∆0n following the above definition, with an upper index 0 to distinguish them from second-order formulas which can also be considered, and while free second-order variables may be used.
A second-order formula is a formula with both first-order and second-order quantifiers; once written in the canonical form described above (starting with a sequence of quantifiers), it is said to be of class Σ1n or Π1n according to its second-order quantifiers only: the kind of the first second-order quantifier in the sequence, and the number of alternations these, while ignoring all first-order quantifiers.

A fundamental fact about second-order formulas is the following.
If the axiom of choice (AC) over base sets is accepted, then any Σ11 formula is reducible to a formula of the form ∃A,∀x, F(A,x) where A is second-order, x is first-order, and F is a formula without quantifier.

Indeed this reduction proceeds by successive uses of the equivalence (expression of AC over Y):

(∀y∈Y,∃u∈U, F(y,u)) ⇔ (∃B∈UY,∀y∈Y, F(y,B(y)))

thus simplifying any consecutive first-order ∀ on the right:
(∀y∈Y,∃u∈U,∀x∈X, F(y,u,x)) ⇔ (∃B∈UY,∀(y,x)∈Y×X, F(y,B(y),x))
and similarly any consecutive second-order ∃ on the left.

Note that if Y is first-order and U is second-order then UY is still second-order.
Another possible reduced form is ∃A,∀x,∃y, F(A,x,y) where A ranges in a set of relations; without AC, the previous reduced form can be reduced into this one, but not vice versa.

However, formulas of the form ∃A,∀x, F(A,x) where A ranges in a set of relations, x ranges in a base set (not a product of them) and F has no quantifier and no function or operation symbol, are actually of class Π01. Intermediate cases between those just mentioned may be hard to classify, and will be ignored here.

Formulas of bounded arithmetic

When describing finite systems (and similarly later also infinite ones), quantifiers usually range over finite sets, and are naturally translated to quantifiers of bounded arithmetic.
For example to describe a system based on a set A with a elements and a set B with b elements, the quantifier (∀x∈A, F(x)) is translated as (∀x<a, F(x)), and the same for B.
(This is itself part of first-order arithmetic, where (∀x<a, F(x)) is seen as an abbreviation of (∀x, x<a ⇒ F(x))).
More sets constructed from these can be used through relevant expressions:
This can be formalized in bounded arithmetic with the only basic operations 0,1,+ and ⋅, and a few more uses of bounded quantifiers, which we shall not detail here.

If the sizes of base sets are much larger than those of formulas which are used, and yet somewhat comparable to each other (of the style: any used first-order bounds a,b satisfy a<bn and b<an for some n much smaller than a and b), then all second-order ranges are larger than all first-order one; this justifies the relevance of that distinction between first-order and second-order quantifiers in bounded arithmetic.

Then, the lessons from previous sections can be phrased as follows.

The strength of formulas of bounded arithmetic, depend on the values of bounds, the number of quantifiers and the number of alternations of them.
The use of consecutive quantifiers of the same kind amounts to multiplying the values of bounds involved, which is a "small" increase of strength.
The use of alternated quantifiers (i.e. consecutive quantifiers of opposite kinds) provides, roughly speaking, a bigger increase of strength (even if the former may not be effectively reducible to it).
Yet, a still bigger strengthening comes by exponential increases of the values of bounds (such as replacing a by ba); namely, the effect of using an additional alternated first-order quantifier with bound a in a formula, namely ∀x<a, ∃y<c, F(x,y) where F starts with a universal quantifier, is covered by the exponential jump of value for the bound of the next quantifier from c to ca, plus a change of F marked by the mere multiplication by a of the bound of its first quantifier.

The hierarchy of formulas of bounded arithmetic, induces a hierarchy of very weak systems of bounded arithmetic, according to the class of formulas accepted as input to the induction axiom.
The lower and upper extremes of the hierarchy of such theories when the respectively accepted classes of formulas range from the class of first-order formulas to the class of n-th order ones for any explicit n (where the bound of an n-th order quantifier is expressed by an exponential tower with height n where all n entries are first-order) are :

Such subsystems of arithmetic have been studied in particular in [Samuel R. Buss, Handbook of proof theory, ch. II].

Truth predicates of bounded arithmetic

Let us now sketch the expression, in bounded arithmetic, of a core truth predicate for formulas of bounded arithmetic, assumed to be in prenex form (the described reductions of some other formulas into this form may also be formalized with added complications).

Namely, the core difficulty is to handle the diversity of quantifier-free formulas, by addressing its two aspects:

  1. The diversity of atomic formulas, which is reducible to that of terms by distinction of cases for a theory with a small list of basic relation symbols;
  2. The diverse uses of logical quantifiers over a finite but not explicitly limited list of atomic formulas

In answer to 1. the truth of an atomic formula F (here meant as abbreviating the data of its syntactic structure, formalizable by means of function symbols...) with size |F|, can be written

∃i<m|F|, (∀x<|F|, C(F,x,m,i)) ∧ R(F,i)

where

By symmetry of the problem (the same method is applicable to the negation of that atomic formula), this truth formula more precisely belongs to a Delta class; the provably equivalent symmetric expression is

∀i<m|F|, (∀x<|F|,C(F,x,m,i)) ⇒ R(F,i)

The exact qualification of that Delta class (either ∆01 or ∆11) depends on whether |F| is supposedly "much smaller" than m, or comparable to it.

In answer to 2. we first need to reduce the use of connectives to that of the mere ∧ and ∨ by progressively moving all ¬ inwards down to the level of atomic sub-formulas, which can thus either use a primitively admitted relation symbol, or its negation.
Then, such a propositional formula, composed of just ∧ and ∨, can be seen as a finite game of going through the formula from its root (outermost connective) to one leaf (an atomic formula), where

The same truth formula given by the concept of winning strategy for B, can be explained in other words : in terms of repeatedly using distributivity to move all ∨ outwards (to the root) and all ∧ inwards (to the leaves). Roughly speaking, the truth of a propositional formula F of size |F| can be so written  ∃s<2|F|, ∀y<|F|, ...

Putting both stages of the definition of truth one into the other, both "dominating" ∃ may be treated as consecutive (as the ∀ in between is of lower order), letting the overall class the same as it was for atomic formulas (if atomic formulas are indeed allowed to have sizes comparable to that of the propositional formula relating them).

Depending on hypothesis about the bounds (in terms of magnitudes, not exact values) on sizes for considered formulas compared to the values of free variables, the truth formula for all such formulas from any given class Σ0n may either belong to another interpretation of Σ0n (with somewhat higher precise bounds than those first fixed), or to ∆11.

The remaining aspect of the problem of expressing truth for bounded arithmetical formulas, aside the consideration of formulas not expressed in prenex form, is to allow variations for the number of quantifiers, namely covering, for example, any formulas of class Σ0n where n may vary beyond explicit limits. The solution to this is again expressible as a winning strategy for a finite game, with the difference that the number of possible choices at each step is some designated number to be interpreted, instead of only the number 2 or another explicit enumeration of elements. Again also, the symmetry of the problem leads to a provably equivalent expression of the opposite kind, which leads to have ∆11 in the role of class gathering all formulas (and the truth formula for these) of bounded arithmetic with class any Σ0n , Π1n or ∆1n, and whose size (and thus also n) is somehow "much smaller" than the values of bounds involved.

Formulas of arithmetic of any order

The above classification of formulas of bounded arithmetic, can be naturally extended to formulas with quantifiers over higher kinds of sets as introduced earlier.

A first (most coarse-grained) classification of formulas is given by the data of the highest kind of set used by quantifiers. Beyond finite sets, the usually considered sets are the iterated power sets ℘i(ℕ) where ℘0(ℕ) = ℕ and ℘i+1(ℕ)=℘(℘i(ℕ)). A quantifier with range ℘i(ℕ) is said to be of order i+1. The order of a formula is defined as the highest order of its quantifiers.

A theory is respectively called a first-order arithmetic, a second-order arithmetic,... an (i+1)-th order arithmetic (collectively named higher-order arithmetic), according to the maximal order of the formulas admitted in its syntax.

Then, a formula of higher-order arithmetic in prenex form is said to be precisely of class Σin (resp. Πin) if it is of order i+1, and its chain of quantifiers contains precisely n alternated quantifiers of order i+1 starting with ∃ (resp. ∀). For example a formula of the form ∃x∈ℕ, ∀y<x, ∃z∈ℕ F(x,y,z) is of class Σ01 only because both first-order quantifiers are of the same sign (not separated by an opposite one of the same order) and thus counted as only one.

A more fully reduced form for formulas of precise classes Σin or Πin, which constitute the more strict definitions for these classes, is to have their chain of quantifiers starting with n alternated quantifiers of order i+1, followed by quantifiers of lower order only (which is not the case of the example above); then also successively reducing the sub-formula of lower order quantifiers, we get a formula whose chain of quantifiers is not only alternating in signs, but also decreasing in order.

Hence the question whether such a reduction is possible. Answer : this depends on the precise choice of axioms given to the considered theory. As we shall review later, the above mentioned reductions of formulas, simplifying their classification, will be more often working than not in the context of respectively relevant theories, according to whether the equivalence between a formula and its reduced version is deducible from the given axioms.
At least, whenever formulas of a given class in first-order or second-order arithmetic are considered, their use usually presumes a context strong enough to allow reducing formulas of lower class. In other words, sub-formulas of an arithmetical formula after the first unbounded numerical quantifier, or those of a second-order arithmetical formula after the first set quantifier, can be assumed to be reduced.

In first-order arithmetic, the involved classes of formulas will be more simply denoted Σn, Πn and ∆n according to the number n of alternated unbounded numerical quantifiers, while ignoring the presence of bounded quantifiers. 

List of reduction rules and their working conditions.

First, each ℘i(ℕ)×℘i(ℕ) is directly reducible to ℘i(ℕ) by an explicit bijection (only using multiplication for i=0, and using even weaker means for other i; this can be contrasted with the need of AC to prove the existence of a bijection between X×X and X for unspecified infinite sets X). Moreover, there is a natural injection from ℘i(ℕ) to ℘j(ℕ) for i<j, and also from finite ranges to ℕ, so that any consecutive quantifiers of the same sign can be summed up to just one with the highest order from these.

This gives a formula whose chain of quantifiers has alternated signs only. The remaining step is to essentially re-order quantifiers in the chain, to let them appear by decreasing strength. This re-ordering proceeds by a succession of steps, each of which consists in essentially switching the places of two consecutive quantifiers, moving a quantifier right in exchange for one of higher order (while of course changing the content of the sub-formula), and therefore also reducing the length of the chain on each side by packing 2 consecutive quantifiers of the same sign into one.

Starting with a given formula with a disordered chain of quantifiers, in order to move all quantifiers of the highest order (in that formula) to the left of all those of lower order, a preferable step by step strategy is to first focus on the right : take the innermost non-reduced sub-formula of the same order, reduce it, then repeat the process by adding into consideration one by one the next quantifiers just left of it. This way, each reduction step consists in moving a lower order quantifier right through an already reduced higher order sub-formula.

The diversity of possible moves, according to the orders of both quantifiers which are switched and the class of the sub-formula right of these, means also a diversity of minimal strengths needed for a theory to justify each of them (this concept of strength of theories will be explained later).

The weakest of all moves belongs to bounded arithmetic: the switch of two bounded quantifiers, possibly reducing the number of bounded quantifiers at the end. It is a form of finite choice: from any formula F we can build a formula G such that, provably,

(∀x<a, ∃y<b, F(x,y)) ⇔ (∃z∈ℕ, ∀x<a, G(x,z))

where G(x,z) says (z<ba ∧ F(x, the x-th digit of z in base b)).

There are only two limits to these:

As we shall consider theories no weaker than EA (except otherwise stated), the number of bounded quantifiers at the end of chains of quantifiers is reducible to a fixed small number.

Once written a truth predicate for a certain class of bounded formulas by a method sketched above (while each such formula escapes the exact class it meant to cover), we get a Π1 truth formula for the class of Π1 arithmetical formulas in reduced form, and therefore similarly (by adding quantifiers left of it) a truth formula for each class Σin or Πin of formulas in reduced forms (for any i and any n>0), where each such truth formula belongs itself to the same class (with only an additional free variable designating the formula to be interpreted).
This allows to reduce any axiom schema over a Σin or Πin class, to a single axiom obtained by applying the schema over a truth formula : this implies every other instance by giving it the relevant value of its parameter.

The next kind of moves belongs to first-order arithmetic: the reduction of

∀x<a, ∃y∈ℕ, F(x,y)

where F is an arithmetical Πn formula for some n. It depends on the arithmetical collection axioms

(∀x<a, ∃y∈ℕ, F(x,y)) ⇔ (∃b∈ℕ, ∀x<a, ∃y<b, F(x,y))

denoted BΓ for each considered class Γ which F ranges over. [Samuel R. Buss, Handbook of proof theory, ch. II] [A proof-theoretic analysis of collection, Lev D. Beklemishev]. They can also be expressed as versions of finite choice.
Collection axioms may be extended to classes Γ beyond first-order arithmetic, but then the reduction problem faces other needs of moves which require stronger axioms.

According to [Σn-Bounding and ∆n-Induction, Theodore A. Slaman], BΣn is equivalent under EA to ∆n-Induction (I∆n), i.e. induction on any Σn formula F equivalent to some Πn formula G, with no provability requirement:
For any Σn formula F and any Πn formula G,
∀parameters, (F(0)∧ ∀x, F(x) ⇔ G(x) ⇒ F(x+1)) ⇒ ∀x, F(x)
also equivalent to the scheme:
∀parameters, ∀y, (F(0)∧ ∀x<y, F(x) ⇔ G(x) ⇒ F(x+1)) ⇒ F(y)

while BΠn is equivalent to BΣn+1 and thus to ∆n+1-Induction. Summing things up, collection axioms are related to induction axioms as follows.
∀n∈ℕ, IΣn+1 ⇒ BΣn+1 ⇔ BΠn ⇔ I∆n+1 ⇒ IΣn ⇔ IΠn

Reduction moves in second-order arithmetic start with uses of the axioms of finite choice (which are provable by induction):
(∀x<a, ∃Y∈℘(ℕ), F(x,Y)) ⇔ (∃Z∈℘(ℕ×ℕ), ∀x<a, F(x,Z|x))
where by definition, y ∈ Z|x ⇔ (x,y)∈ Z.

Then come some versions of the axiom of countable choice, of the form
(∀x∈ℕ, ∃Y∈℘(ℕ), F(x,Y)) ⇔ (∃Z∈℘(ℕ×ℕ), ∀x∈ℕ, F(x,Z|x))

(so named to mean the use of AC where the domain of ∀ is countable, that is, has a bijection with ℕ)
One may think of a candidate weaker solution for reduction, that would be another kind of collection axioms
(∀x∈ℕ, ∃Y∈℘(ℕ), F(x,Y)) ⇔ (∃Z∈℘(ℕ×ℕ), ∀x∈ℕ, ∃z∈ℕ, F(x,Z|z))
but it can be simplified into the countable choice version, by composition with the function which from each x gives the smallest z that fits. This method can only form a genuine weakening in stronger cases where z becomes a variable of second order or more (thus Y must be at least of third order for this to have any interest), so that the idea of picking the smallest z no more works.

Actually the distinction between ℘(ℕ) and ℕ, which we neglected here, may matter in some ways. The issue is that the simulation of ℕ by the set of singletons in ℘(ℕ), and thus the simulation of ℕ by a subset of ℘(ℕ×ℕ) and thus of ℘(ℕ), requires some use of numerical quantifiers. For this reason, two optimally reduced forms for Σ11 formulas need to be distinguished:

∃X∈℘(ℕ), ∀y∈ℕ, ∃z∈ℕ, F(X,y,z)
∃X∈ℕ, ∀y∈ℕ, F'(X,y)

where F,F' have bounded quantifiers only. Some syntactic adaptations are needed, replacing the use of ℘(ℕ) by that of ℕ, in particular in the expression of countable choice (turning it like our first mention of AC), in order to fit this second form of reduction.

On the other hand, the Weak König's Lemma allows to reduce formulas of the form ∃X∈℘(ℕ), ∀y∈ℕ, F(X,y), (where F only has bounded quantifiers and only uses X in the form of the relation (...∈X)), into Π01 formulas.

Hierarchies of sets

The above described hierarchy of formulas induces a hierarchy of definable sets in each ℘k(ℕ) for k>0.

Namely, for a given k>0, a set X∈℘k(ℕ) is said to be of class Σin (resp. Πin) if it is definable as X={x∈℘k-1(ℕ)|F(x)} for some formula F of class Σin (resp. Πin).
Since truth in a class of formulas is expressible by a formula of the same class, the set of all true formulas from a class, also belongs to the class so named, as a subset of the set of all formulas of that class (identifiable to ℕ by some encoding)

The sets of class Σ0n or Π0n for some n∈ℕ are called arithmetical; those of class Σ1n or Π1n for some n∈ℕ are called analytical.

Numerical parameters (i.e. with value in ℕ) in F bring nothing as they can be eliminated (defined themselves by terms).
A set parameter with value in ℘(ℕ) already has deep impact on the classification of sets definable from it.

For k=1, a set parameter would trivialize the classification of definable sets (except for the classification relative to a fixed value of the set parameter used in the definition), so it needs to be banned, thus giving in ℘(ℕ) a hierarchy of classes of definable sets, where each class is countable (as there are countably many formulas).
 
For k=2 we need to distinguish between the lightface hierarchy (Σ1n and Π1n) whose defining formulas have no parameter, so that each such class is also countable; and the boldface hierarchy (Σ1n and Π1n) of sets defined by a formula of respective class but using a set parameter in ℘(ℕ), so that each such class of sets has the cardinality of the continuum (i.e. it is in bijection with ℘(ℕ)).

A set is said to be of class ∆in if it is equivalently definable by some formula of class Σin and some other formula of class Πin. This equivalence is not required to be provable; it just need to hold for all x∈℘k-1(ℕ). The provability of such equivalences depends not only on the formulas, but also on the strength (proving power) of the foundational theory that is used, as we shall discuss later.

The class ∆11 of sets for k=1 as well as the lightface one of k=2, is the union of the classes Σ0α for all recursive ordinals α (which are all the ordinals for which Σ0α makes sense); sets in this class are called the hyperarithmetical sets. A recursive ordinal can be conceived as the isomorphism class of a recursively defined well-order of a subset of ℕ; it is therefore countable.
The boldface 11, on the other hand, is the union of the classes Σ0α for all countable ordinals α; it is called the class of Borel sets.

Interdependence of set complexities

This hierarchy of formulas, where each named class may have different precise definitions depending on syntax details, and whose equivalence is not always ensured, is only an introductory step towards the actual description of the strength hierarchy, which is even more complex.

Indeed the succession of named infinite power sets ℘i(ℕ) for each i>0, is only a syntactic succession representing successively stronger sets, each of which seems fixed from the viewpoint of a given theory with a supposedly fixed semantics; but this cannot, in itself, fix the successive strengths of these sets in the absolute, by lack of means to formally require them to contain sets beyond those definable by the accepted means of the given theory.

In details, the axioms specifying that some ℘k(ℕ) contains some given classes (Σin or Πin) of definable subsets of ℘k-1(ℕ) where ki, can induce the existence of more sets than those so specified, because starting with empty content and completing it by additionally defined sets, can disturb the interpretation of some definitions. This will lead the same defining formulas to generate more sets, and thus leave orphan their previous offspring still involved in driving other defining formulas to their respective values. Hence the difficulty to assess the precise strength of such axioms, to which we shall come back later.

(Yet there exist universes of set theory where every set is definable, according to
POINTWISE DEFINABLE MODELS OF SET THEORY - JOEL DAVID HAMKINS, DAVID LINETSKY, AND JONAS REITZ
but that remarkable circumstance cannot be itself expressed in the same language of set theory, that would be needed for any attempt to axiomatically claim or deny it)

Provability and the strength hierarchy

Of course, whatever fundamentally escapes expression by syntax cannot be studied either, but an effective syntactic exploration is actually possible here, in the form of the hierarchy of possible lists of axioms (or, of possible sets of provable formulas, deducible from axioms) which an arithmetic of any order can be given. Intuitively speaking, the strength of a theory consists in the lower bound on the strength of the semantics that remains compatible with its axioms; and for example, a second-order arithmetic with strong axioms can be stronger than a hundredth order arithmetic with weak axioms.

(For simplicity we shall often designate a theory by just specifying its strongest axiom(s), i.e. those which newly come precisely at its strength level, keeping implicit the other axioms also needed but already present in weaker theories along the main line of foundational theories.)

Let us now explain why the concept of provability of formulas must be subject to an endless hierarchy of more and more encompassing versions, which actually forms the core measure of strength for theories.

First, observe that any provability concept must be expressible by a Σ01 formula (∃p∈ℕ, p encodes a proof of F).
Second, some basic quality requirements for a provability concept to fit its job (soundness and the recognition of clearly available proofs) imply that its restriction to the class of Σ01 formulas also works as a truth predicate there. However as previously explained, the truth on the opposite class of Π01 formulas cannot have a Σ01 expression, and thus cannot be exhausted by any fixed provability concept. So, as long as a theory is sound (provability implies truth), it cannot be complete on any class of formulas beyond Σ01, such as Π01 : (it must contain true but unprovable formulas). Moreover, unlike for higher classes, any proof of a false Π01 formula leads to contradiction with the straightforward proof of its negation.

Hence the concept of strength hierarchy of theories, which may have several "most often equivalent" definitions, but the main ones we shall consider are

  1. Inclusion of the sets of Π01 theorems: T' is stronger than T if all Π01 formulas provable by T' are also provable by T;
  2. Inclusion of the whole sets of theorems : same without restriction to a particular class of formulas.

The first one may be considered the clearest, as it forms a total order between all "natural" theories, i.e. which belong to the main thread of foundational theories (while of course other theories can stay out of that total order), while the second one somehow mixes different criteria, with more risk for "natural" theories to appear incomparable in its sense.

Practically any theorem of bounded arithmetic is actually of class Π01 (no less) because, to apply beyond explicit written bounds, all bounded quantifiers involved need bounds given by free variables which implicitly have to be themselves under ∀ in ℕ.

The Löwenheim–Skolem theorem

We shall now give another tool clarifying the fact that, despite Cantor's theorem making infinite sets involved in some higher-order arithmetic formally appear as sets of different cardinalities, there is another philosophical perspective by which they are all countable sets, only assigned different roles, so that the different strengths they seem to get in a given theory actually come from the different structures by which they are used.

To describe an infinite system, we just have a little problem: we cannot really access it to interpret our formulas there, because our operational tools are finite. Never mind, we just need to call a god who could observe one, and undertake a conversation with him about it.
The conversation proceeds as follows (here are the general rules, inside which more specific assumptions or strategies can be followed depending on purpose).

At any time of the conversation, you have a partial map of the system, made of:

The first version of the list of formulas can be thought of as an axioms list of some theory; depending on context, this may either be an explicit list of formulas known true for theoretical reasons, or a theoretically conceived list of revealed truths from the god.

Along the conversation, your map expands step by step. The kinds of possible steps are the following:
For this to work the whole way to its infinite limit, there is a condition on the framework: it requires the axiom DC of Dependent Choice (which is a weak version of AC):

(∀n∈ℕ, ∀xE, ∃yE, R(n,X,Y)) ⇒ (∀aE, ∃ZE, Z(0)=a ∧ ∀n∈ℕ, R(n,Z(n),Z(n+1))

This is provably equivalent to the simpler formula where the use of the variable a, and the presence of n as a first argument of R, are removed; we keep them here for convenience.
The case when E is countable is provable without AC, by choosing each time the smallest example according to a fixed bijection with ℕ.

Let us be very precise on the ontological conditions this construction relies on.

It looks like an infinite two players game with choices at each step. Let us put aside the choice whether more or less truths are revealed beyond those mentioned above, as this degree of freedom will be completely closed by a specification in each context of use.

The only remaining choice you seem to have is on the order of cases you will pick for adding new names by the rules DEF and EX, where the list of available formulas for EX is especially likely to get endlessly expanded by applying SUB on a formula of class Πn where n>1, with the growing list of names.
Yet this freedom of yours can be ignored, since it no way affects the final result : presuming any arbitrary strategy from your part, the replies by the god can be used as replies to any other strategy, since all possible strategies of yours can only be asking the same questions in a different order, allowing the god to give the same values to names given by the same introduction rules between your possible strategies, and thus also give the same answers on truths. (Actually, this conformity of answers is required to keep you on the same track of possible questions). At their respective limits, all such maps, being given the same interpretation, are of course isomorphic, as they are just differently labelling by numbers the same countable subset of the territory.

The only choices which really affect the result, thus, are the god's choices, namely those of suitable values during the EX steps (and also during DIS, but this role of DIS can be eliminated by using formulas in prenex form only).
For these to proceed, the conditions are the following.

First, the god needs to have actually interpreted all given formulas in his target system sufficiently to verify their truth, in the sense of effectively knowing an example of object that witnesses the truth of each existential quantifier contained there whenever needed for any given values of variables. This involves seeing the domains of all variables as sets, at least somehow.
Then, the acts of choosing one example among the possibly many available ones, need not have been ready in advance for every possible case : doing a choice for every possible values of all variables (beyond the values actually picked) would be a stronger kind of use of AC if the domains of variables are not known to be countable.
What it takes, then, expressible as an instance of DC, is just a sequence of choices, where each choice is for a case (a particular existential formula with particular values of names) which depends on the previous choices. And, it takes the time to make these choices successively rather than simultaneously (while possibilities of countable amounts of independent choices are instances of the axiom of Countable Choice, which are weaker for given domains).

Actually DC is an equivalent condition, since it comes back as the particular case of a conversation starting with the formula (∀n∈ℕ, ∀xE, ∃yE, R(n,X,Y)) and a few more stuff to proceed.

The magic of a conversation with these rules is that, if endlessly continued, the resulting potential content of your map is identifiable with a subset of the territory, with the same features as those expected of the whole : all the features and truths promised by the axioms for the territory, are getting progressively fulfilled by your map itself.

This can still theoretically proceed assuming that all truths about the territory, expressible by any given range of formulas, were revealed (provided we could find a god to interpret these formulas there), in which case none of such formulas can distinguish the map from the territory : both will have all the same true formulas beyond any explicitly definable list of axioms.

The relativity of infinite cardinals

Yet, the map is necessarily countable (the names can be labelled by natural numbers according to their introduction time), regardless whether the territory is countable or not. So, for any theory which happens to be true somewhere, none of its axioms can ever block the existence of a valid countable interpretation of the same theory. In particular, any theory with the power to apply Lowenheim-Skolem to a given class of formulas involving the set 2, will deduce the existence of a countable system fulfilling the same truths from that class. It is easy to see that the interpretation of ℕ is preserved, while 2 gets re-interpreted as a countable subset P⊂2.

This forms a paradox with Cantor's theorem for theories of second-order arithmetic and any other theory supposedly admitting ℘(ℕ) as an available domain of quantification, which appears to present ℘(ℕ) as necessarily uncountable, so that P≠2.
Intuitively speaking, this is because the intended claim P=2 would be second-order with respect to the sets P and ℕ (which are among those formally involved as ranges of variables of a given theory, and seen as first-order ranges when interpreted by first-order logic), and thus cannot be among the expressed formulas whose conformity of truth was ensured; while the countability of P is also a second-order concept in this sense.

To see things more precisely, let us write down a formal expression of Cantor's theorem : for any set E,

f∈(2E)E, ∃A∈2E, ∀x∈E, f(x)≠A

here applied to the case E=ℕ; remember that (2) has an expressible bijection with 2, through which the quantifier ∀f∈(2) may be re-expressed as one with domain 2.
Then, Lowenheim-Skolem also maps (2) by a countable subset P'⊂ (2), linked to P by the natural bijection between 2 and (2) (there are different natural bijections but they all give the same determination of P' from P thanks to some basic axioms).

As P is countable, there exists fP that is bijective (so ∀AP, ∃x∈ℕ, f(x)=A).
By Cantor's theorem in the countable interpretation, we get fP'.

Hence the possibility of a philosophical view over the same foundational theories of mathematics, defended by
Naive Infinitism: The Case for an Inconsistency Approach to Infinite Collections, Toby Meadows
Notre Dame Journal of Formal Logic Volume 56, Number 1, 2015
https://sites.google.com/site/tobymeadows/papers-talks

and made of the following thesis (here re-expressed, not Meadow's words):

For example, with a car you can explore a continent but not cross an ocean to each another continent; with a plane you can reach another continent but not another planet.
Between a god G viewing math in that way, and another H claiming to have the absolutely complete power sets of infinite sets, the dispute may be endless, as long as they only have finite expression means at any time to exchange their arguments, in the style of the omnipotence paradox:

H : "I got the complete, uncountable interpretation of ℘(ℕ), that transcends any possible countable sequence of subsets of ℕ !"
G : "I can beat any interpretation of ℘(ℕ) you may have, by finding a bijection of it with ℕ that will make it countable !"

(and similarly for the power sets of other infinite sets)
Yet there appears an asymmetry between them: the view of H is for most purposes quite stronger than that of G. The Lowenheim-Skolem theorem was a rather easy way down from H to G, but the inverse way, from G to H, involves a bottom-up reconstruction in potentialist terms, that needs further explanations.
The idea is that, in lack of a real H, the view G can anyway be expanded up to a simulation of H. The main issue is the following

Potential stability thesis. Any thread of sets exploration by given tools, will remain contained by some horizons of stability relative to these tools, in the sense that such a horizon gives to each considered "power set" an appearance of completeness that resists these tools. Precisely speaking, any reasonable version of set theory admits interpretations as universes of sets, satisfying all its axioms describing diverse ways for designated "power sets" to behave as complete.

We shall address later the crucial point of justifying important cases of this thesis.
Now let us just explain the last step, from this thesis to the simulation of H by G. The issue is to idealize this concept of stability horizon, from cases of those with only specific strengths of stability able to resist the strength of given sets exploration tools, to those with ideally absolute strength that will resist any conceivable sets exploration tool.
Let us mention different ways to answer this problem.
One is to notice that there is no upper limit to strength of stability of possible horizons, either metaphysically nor from the axioms of any good set theory. So, pointing out that a given set theory admits a universe whose stability simply fits the strength of its axioms, does not hinder the possibility to interpret the same theory by horizons with any higher stability that may be desired later. Now, once any work was done in any given theory, it is always possible to switch to any stronger theory and reinterpret the previous work as in fact more precisely meant to refer to some of the newly requested kinds of universes, stronger than the conditions which were explicitly formalized before.

Another way is to consider that successively stronger exploration tools which will be considered, whatever they will be, will form a sequence that can be itself taken as a tool stronger than them all, to which the potential stability thesis can be applied; a horizon of stability for that one can then play the role of H.

But, this raises some questions.
One question is whether such a simulated H, thus argued to exist, also enjoys a kind of essential uniqueness. The answer turns out negative as this non-uniqueness causes the undecidability of the continuum hypothesis, which resists large cardinal axioms, and thus also of endlessly many further independent statements which belong to the same and higher classes. This comes in contrast with the fact that truth in lower classes of statements was found to be ultimately fixed in some sense.

Another question is whether this succession is supposed to be recursive. Indeed the point of the strength hierarchy as explained above was that each specific theory formalizes a specific recursive provability predicate, and we need a non-recursive succession of them to approach a non-recursive truth predicate. This difference may be dismissed as futile as it only distinguishes between possible limits of infinite progressions of successively stronger theories to be considered, which is hardly anyone's concerns.
A further question is whether the potential stability thesis is still applicable to some hypothetical, divinely revealed non-recursive sequence of stronger and stronger tools escaping any sound and recursive provability predicate, such as the full data of the truth predicate on a certain class of formulas. And actually, we will have some reasons to think it is.

The completeness theorem

Yet, the above also raises the general question whether sense can be made of a theory made of an infinity of axioms, of which we could only ensure the meaningfulness of finite sub-lists at a time. The answer is yes, but with a cop-out we shall explain later.

Actually, the same strategy to generate interpretations, works to solve two problems at a time
The latter is somehow a particular case of the former: any contradiction in a theory can only use a finite list of its axioms, so that such a finite list is already inconsistent; therefore the interpretability of each finite list of axioms implies the consistency of the theory.

The construction of the completeness theorem follows the same rules of conversation we described for the Lowenheim-Skolem theorem, with only the following difference : the god is not witnessing a system he knew in advance, but only playfully pretending to do so. His answers are not caring to conform to any witnessed truth, but only to preserve logical consistency. Considering the conversation as a two players game, your questions would amount to a thorough search for contradictions out of the given formulas, and the end result after an infinity of steps is a matter of the god having followed a winning strategy for his goal to never be caught in contradiction.
The case of the ultraproduct follows the same description, replacing "logical consistency" by "compatibility with an infinity of the given interpretations"

Beyond some special cases of non-foundational theories, very weak foundational theories or low-class formulas to start with, the god still needs a divine knowledge to succeed.
Indeed, a consistent theory will either stay consistent with a formula F, or with ¬F, or both, but there is no general computable method to find out which case we are in (if no contradiction from either F or its negation could be found after some given limited time of search, no matter how long it was, then the search needs to be given up inconclusively, as the risk of not having searched long enough to find one that exists can never be ruled out). Thus, there is no general computable method either to choose adding either F or ¬F and be sure of not introducing an inconsistency, that is, no computable method can escape the risk of making a wrong choice that would end up in contradiction after a possibly very long but inescapable time.
The existence of a god's winning strategy out of any consistent theory, is equivalent to the Weak König's Lemma.

The fundamental conclusion of this scenario is that, regardless that the god may not have felt aware of an existing interpretation of the given theory at the beginning of the conversation, his strategical ability to preserve consistency leads him to end up building one, thus giving reality to what started as a lie or fancy.

Three ontological lessons may be drawn from this (and some aspects of these lessons come from the Lowenheim-Skolem theorem alone).

The local lesson is that the existence of a certain mathematical system, is inseparable from (so to speak: locally coincidental with) the activity of formally describing such a system, either way:
The global lesson is a fundamental connection between truth, ontology and provability : whatever is consistent, is true in some existing reality (and thus, for any unprovable formula, there exists a counter-example universe, where it is false).

The lesson somewhere in between, is, in reply to those philosophers who might have fancied there a matter of dispute, the lack of genuine ontological difference behind the diversity of foundational theories for mathematics which do not recognize the same kinds of primitive objects, provided that they are mutually interpretable.

Namely, some foundational theories are set theories where everything is a set; other set theories also include urelements; it may be convenient to also admit functions as primitive objects; other theories admit natural numbers as primitive, in which case sets may either be assumed to contain natural numbers only (forming a version of second-order arithmetic) or allowed to contain other sets or something else.
But, as long as objects, formulas and proofs can be systematically translated from one of these theories to another, they are essentially synonymous from an ontological viewpoint: they are just different perspectives on the same mathematical reality, and coexist just "next to each other".

Now a special point to add to this picture, is that consistency or provability statements may be themselves undecidable... which means that these, and thus also their equivalent ontological statements, may be one way in some universes, and the other way in others.

Non-standard models of arithmetic

As earlier explained, any consistent theory leaves some true Π01 formulas unproven. By completeness, such a theory admits an interpretation where its negation, a Σ01 formula (∃x∈ℕ, F(x)) where F is bounded arithmetical, is seen as true despite being actually false.
So, if you wonder if (∃x∈ℕ, F(x)) holds, and try to investigate this by logical deductions from your favorite theory which actually leaves it undecidable, no matter how long you search, you will not find the answer, but you still never know if it is indeed undecidable, or you just did not search long enough.
The only other chance to get out in case that first search was indeed vain, is to call a god for help.
If you find a god telling that (∃x∈ℕ, F(x)) is false, then you can trust him on that indeed, because you know he could foresee and carefully escape the risk to be later caught in contradiction by the discovery of an x fitting F(x).
But if you only found gods claiming it true, then you still never know. You perhaps just didn't stumble on the right gods.

Let us expand on the behavior of a god G claiming to see an x∈ℕ such that F(x), when no such x actually exists. For any number n which a limited being Y might explicitly write down, G will keep telling that x≠n, and more precisely n<x:
Y: How much bigger is your x than, say, a gogolplex (1010100)?
G: Haha. My x is still very much bigger that that.
Y: What with a power tower of a million tens ? The Graham number ?
...and so on, taking on the challenge of offering still more elaborate definitions of very large numbers in the role of n, but the reaction will be the same. Y may keep trying further in the form of more and more complex definitions for larger and larger numbers with still reasonable size of definition, but for doing so, would need stronger and stronger theories to remain able to prove their meaningfulness in reasonable time (to find proofs with sizes unlike the size of the meant number; the strength involved to do so is rather the Π02 strength, not of the Π01 one).
Since none of this suffices:
Y : Please answer this challenge yourself then : give me a description of an upper bound for x !
G: I could, but you may run out of the necessary patience to follow, as it takes so long to explain.
Y : Are you sure ? Please find any way possible to make it.
G: All right here you are. The square of the smallest z such that F(z), is indeed an upper bound for x.
Y: Uh ? But I do not know that such a number exists !
G: You don't ? I do !

Now a god who both understands G's view and knows its falsity, explains it as follows. G interprets the set ℕ as a set ℕG containing not only all the true natural numbers, called the standard numbers (elements of the true meaning of ℕ), but other elements, called non-standard numbers, and seen as greater than all standard numbers. The set of all standard numbers as a subset of ℕG violates the axiom of induction in ℕG. The mistake of G, comes from his failure to fully test the induction axiom in his ℕG, due to his inability to discover all its existing subsets. The recognition of this error is a matter of knowing (or being revealed) some subsets A of ℕG, yet unknown by G, which are counter-examples to the induction axiom (and more precisely some also counter-examples to (∃x∈A, F(x)), to come back to our point of dispute).
Then the intersection of all such subsets is also one, forming the true ℕ. But, in lack of an ultimate interpretation of the range of all subsets of the infinite set ℕG, the means are lacking to so define the true ℕ. We may only refer to more or less strongly stable interpretations of this power set, whose respective resulting intersections are more or less strong (realistic) interpretations of ℕ.

Arithmetic and ontology

All this needs to start from some existing interpretation ℕG of arithmetic. This departure point has to be assumed out of nowhere, as a kind of fundamental relativity (subjectivity) of ontology. There is nothing to metaphysically favor a particular interpretation of arithmetic as the first existing one (the only special one is the standard one, which is included in all others, but it is also the strongest of all). We may rather think of ontology as initially undecided with respect the range of possible versions (interpretations) of mathematical reality, before being progressively refined (focused) along time.

The completeness showed a path from the consistency statement of a given theory (which is an arithmetical statement), to the existence of interpretations. In the light of the above (the relativity of truth about consistency), let us now comment on the inverse path, from ontology to arithmetical truth.

We may distinguish two links that make up this path: one between a model of a theory and a model of arithmetic implicitly needed to contain its formalism; the other between models of arithmetic and arithmetical truths.

For the first link, we need to distinguish, for any system M described by a theory T, between M in itself and the verification process of the fact that M is indeed a model of T. Indeed M is usually kind of simple in its content, in the sense that it is just made of set(s) and structures that interpret the symbols of the theory. But to interpret T there and check the validity of the match, an infinite process may be needed. The first issue is for the case T supposedly has an infinity of axioms: a model of arithmetic is needed to interpret the intended content of the axioms list. Then a model MA of arithmetic is anyway needed to interpret the range of all possible expressions of T, beyond axioms. This choice of MA is constrained (tested) by the need to interpret all expressions of T in M. Non-standard expressions (i.e. non-standard numbers while numbers play the role of codes of expressions) are not generally ensured to be uniquely interpretable in any system; they only appear to be given a natural unique interpretation in M by a god (a reality system) who knows M while using MA as his interpretation of arithmetic, mistaking it as standard.

But, when an M from outside his reality, comes to break into it (revealed by a greater god), it may happen that the connection fails, as the recursion process involved in interpreting in M successive "expressions" with larger sizes or higher classes which might be non-standard, suddenly fails, revealing the non-standardness of MA.

In case the interpretation process still goes through along the whole MA, two other kinds of mismatch may occur:
So, successive revelations of existence of mathematical systems M, are linked to the revelations of subsets of ℕ which may uncover the non-standardness of previously acceptable-looking interpretations of ℕ; this narrowing of the range of compatible models of arithmetic is linked to the revelations of truth of previously undecided Π01 formulas, especially consistency statements for theories describing M.

The well-order of expressible ontology

We are now ready to introduce the core concept of ordinal analysis: the fact the main thread of foundational theories for mathematics is well-ordered, thus identifiable with an ordinal, namely the first nonrecursive ordinal, called the Church–Kleene ordinal and denoted by ω1CK; each particular foundational theory in the main thread gets labelled by a recursive ordinal.
That is actually not very accurate, as there are a few cop-outs to this general fact. Let us gather a few hints towards it.

It can be shown that if a binary relation is altogether extensional, transitive and well-founded, then it is also total, and thus a strict well-order.
If a binary relation is just transitive and well-founded but not extensional, then it still has a quite natural morphism (quotient) into a strict well-order, which is no more injective.

As we shall explain, the main time line of mathematical ontology is both transitive and well-founded, which leads to label each foundational theory by a unique ordinal, but the same ordinal may qualify multiple theories; depending on precise definitions (especially when restricting consideration to the class Π01 of formulas), one may distinguish such a thread which is also extensional, thus keeping just one theory for each ordinal.

Due to the recursive nature of the provability powers of theories, used to compare their strengths, the strength of theories (in the main thread of natural foundational theories) is more precisely measured by recursive ordinals. This measure is the topic of ordinal analysis. An overview of some core issues of ordinal analysis is given for example in [On the hierarchy of natural theories, James Walsh]. Some more details are developed here.

The time order of mathematical ontology is the order of relative existence: a world M precedes a world M' if M exists inside M', in the sense that it is possible to designate and fully interpret M inside M'; this relation is made irreflexive by the truth undefinability theorem.

So conceived, its transitivity is obvious: if a system with given properties appears to exist in some existing world then it simply exists with those properties (all axioms with standard size which describe it, got correctly interpreted and satisfied in that world).
Switching from semantics to syntax via the completeness theorem, translates this time order (relative existence between worlds) to a relation of provable consistency : a theory T precedes a theory T' if T' can prove the consistency of T (so that every model of T' recognizes the existence of some model of T). his transitivity takes the form of the fact all Σ01 truths are provable, to be used in contrapositive way on consistency statements which are Π01: if a consistency statement is consistent then it is true.

The well-foundedness is a bit less clear, already since it is a second-order property, whose formalizations cannot exclude non-standard interpretations, which leaves possibilities of some kinds of exceptions to that rule.
There are multiple versions of this well-foundedness, pertaining to either syntax or semantics.
A first version, which occurs somehow in between syntax and semantics, is expressed by the famous Second Incompleteness Theorem, as follows.

Take any theory T expressing arithmetic and presented by its (Σ01) provability predicate.
Using the above transitivity, the self-reflecting Π01 formula saying "this formula is unprovable from T", is found provably equivalent to the consistency of T. Thus, the consistency of T is provably equivalent to its own unprovability from T.
By the syntax/semantics equivalence, this can be re-expressed as : "If there are any worlds fitting T then among these there is one not itself containing any such world". This sentence is the very formula of well-foundedness, except only that its second-order quantifier is restricted, from the power set of the range of all worlds, to the only classes of worlds defined each as the class of those worlds fitting some given expressible theory.

Notes:
As for extensionality, it may have diverse exceptions depending on the precise ranges of considered theories and the comparison criteria.

Π01 ordinals of theories

Let us introduce an expression of ordinal analysis, which may be the clearest introductory approach to the strength hierarchy of theories, keeping track of the link with ontology via the completeness theorem.
We shall start with the concept of Π01-ordinal of a theory, which only depends on its provability predicate over Π01 formulas. It is the most fine-grained all ordinal measures for the strength of theories, thus best for distinguishing the weakest ones.

It is the first in a sequence of ordinal strength measures which are more and more coarse-grained, though they measure different data: for each n>0 the Π0n-ordinal qualifies provability predicates over Π0n formulas.
As these qualify provability predicates of theories over arithmetical formulas only, we can proceed their review in terms of the mere language of first-order arithmetic, and simplify the notation from Π0n to Πn.

This sequence of Πn-ordinals is most relevant to measure the strength of theories in between EA and the theory PA of "Peano Arithmetic", that is first-order arithmetic axiomatized by the scheme of induction (IΠn for every n). Indeed:
These ordinal measures have been studied in diverse works such as
[Proof-theoretic analysis by iterated reflection, Lev D. Beklemishev]
[Reflection calculus and conservativity spectra, Lev D. Beklemishev]
which are unfortunately not easy to follow. Below is an attempt to popularize some key ideas from these works.

Except otherwise stated, all considered theories T will be recursive, i.e. given by a Σ1-definable set of axioms. The above reference more precisely requests definability by a bounded formula of EA. I wonder where this may matter ; the general case of a recursive set of axioms is effectively needed to describe the arithmetical consequences of any well-described set theory while ignoring the precise nature of the non-arithmetical axioms from which they are deduced. Anyway, recursive theories lead to recursive sets of theorems (provability predicates): for any formula F, the claim "T proves F" has class Σ1, as an abbreviation of a formula essentially saying "∃p, p encodes a proof of F from T".

A reflection principle for T is an expression of soundness of T (a claim that all T-provable formulas are true) over a class Γ of formulas (that we shall take to be either Πn or Σn for some n). There are two versions of this, which may differ:

The local Γ-reflection principle, denoted RfnΓ(T) takes the form of a schema of one axiom per formula F in Γ :
(T proves F) ⇒ F.

The uniform Γ-reflection principle, denoted RFNΓ(T), is expressible as a single axiom, using the truth predicate over Γ:
∀F∈Γ, (T proves F) ⇒ (F is true)
or equivalently, as the schema of one axiom per formula F of class Γ with one numerical variable x:
∀x, (T proves F(x)) ⇒ F(x).

Note that ∀n>0, RFNΠn(T) ∈ Πn.

For any theory T0 extending EA, and any theory T extending T0 by a set of Π1-axioms, the following theories are equivalent (where Σ0 is the class of bounded formulas):
  1. T + "T is consistent"
  2. T0 + "T is consistent"
  3. T0 + RFNΠ1(T)
  4. T0 + RfnΣ0(T)
The implications 1.⇒2. and 3.⇒4. are trivial.
2.⇒1. the truth of any Π1-axiom of T is deducible from its consistency, by the fact its falsity would make T inconsistent with the data of a counter-example.
2.⇒3. by formalizing the argument "A proof from T of a false Π1 formula would lead to an inconsistency of T in the face of a counter-example".
4.⇒2. is obvious (with F="false").

Since T cannot prove its own consistency, the theory T' given by the above presentations has a wider proving power, and is thus "the next theory" after T, expanding it by just another (or a more powerful) Π1 axiom. This step from a theory to the next stronger one can be iterated transfinitely along any ordinal which is just definable enough to proceed, namely keeping consistency statements expressible by Π1 formulas, and these are the recursive ordinals (if I don't mistake). This transfinite sequence of theories, starting from a theory T0, is said to be the result of iterated reflection from T0 along the given order.

A funny detail is that, in the case of elementary ordinals (given by an elementary well-order, i.e. defined by a bounded formula in the language of EA), this recursive sequence of theories also admits an elementary expression. In a sense this is trivial, at least for non-limit cases, since they are expressible in the form 2. or 3. made of T0 with a single additional axiom. But it has been shown to hold in all cases, with steps presented in the form 1., as there is a subtle way to express a solution to that recursive condition by an elementary formula, by means of self-quotation (as mentioned with truth undefinability); then the essential uniqueness of this solution is deduced, not from the well-foundedness of the iteration order, but from a corollary of the second incompleteness theorem. Indeed, well-foundedness cannot be used because it is not generally expressible in EA anyway; and it is not a usual case of transfinite recursion where limits are given extensively by some infinite stuff, but a special case where these limits need to be translated into a finitely expressed definition, to allow for a formulation of their consistency.

A consequence of this funny detail is that such recursions can proceed with their existence and essential uniqueness along non-well-founded orders as well. But, ill-founded iterated reflection is usually quite boring (with possible exceptions, such as where that ill-foundedness is somehow unprovable ?), as it results in the (essentially unique) inconsistent theory.

In [Proof-theoretic analysis by iterated reflection, Lev D. Beklemishev], the Π1-ordinal |T|Π1 of a theory T is defined as the ordinal α such that the α-th iterated reflection from T0 = EA gives the same Π1 theorems as T. So defined, |EA|Π1 = 0 and |EA + Con(EA)|Π1 = 1. However, this convention suffers some arbitrariness by its way of assuming the choice T0 = EA; different choices for T0 result in different transfinite sequences of provability predicates over Π1, and thus possibly different measures for the Π1-ordinal of a given theory.
The study of this dependence, obtained in his works, suggests what a more objective Π1-ordinal labelling of theories would look like. But to explain it, we need to also introduce the other Πn-ordinals and explain how they are related.

Πn ordinals

The above mentioned transfinite sequences of theories by iterated reflection also works with other classes of formulas.
Namely, for the same reason as above, for any n and any elementary presented theory T containing EA, there is equivalence in EA between RFNΣn(T), RFNΠn+1(T), and the consistency of T extended by the whole set of Πn truths (as interpreted by a Πn truth formula).

These equivalent formulas are of class Πn+1; we shall write them in modal notation as ◇n(T)
They form the basic step whose transfinite iteration from an initial theory T defines a transfinite sequence of theories differing by their Πn+1 axioms, and thus to be labelled by different values of a certain ordinal measure of their Πn+1 strength.
Then the study of these sequences of theories and their dependence on T suggests the following kind of conventions as suitable to express their properties in simple form.

We shall attribute Πn+1 ordinal strengths not only to theories, but also to individual Πn+1 formulas (both definitions should not be confused); but not to all theories, resp. to all Πn+1 formulas, but only to the special case of those (theories or formulas) which are regular in some sense, i.e. they belong to the main thread of successive strengths (though there are different degrees of strictness by which this may be conceived). Let us denote Sn⊂Πn+1 the set of those Πn+1 formulas which have a definite ordinal strength.

Let us define notations as follows. For any regular theory T, any number n and any recursive ordinal α,

n(T) ∈ Sn ⊂ Πn+1
Tnα := T+{◇nβ(T) | β<α} (regular theory obtained by adding to T the schema of axioms: ◇nβ(T) for each β<α)
nα(T) :⇔ ◇n(Tnα) ∈ Sn
thus
Tn0 = T,
Tnα+1 ⇔ T+◇nα(T), and
nα(T) ⇔ (◇n(T) ∧ ∀β<α, ◇nβ+1(T)).
In particular, ◇n0(T) ⇔ ◇n(T).
When α is a limit ordinal, Tnα contains ◇nβ(T) for each β<α but does not prove ◇nα(T) that is (∀β<α, ◇nβ(T)). This means it admits interpretations where ◇nβ(T) fails for some non-standard value of β<α greater than all standard ones.

|T|n = ||◇n(T)||n is the Πn+1-ordinal strength of T

{||F||n | F∈Sn ∧ T proves F}⊂ |T|n (in the sense of von Neumann ordinals) with equality if n>0.
As further commented below, the question whether S0 and ||F||0 can be defined to provide non-limit values able to fill the set |T|0, is out of scope of this exposition.
These ordinal measures have the following two properties
Together they imply : |T|n+1≠0 ⇒ µ≠0 ⇒ |T|n ≥ ω|T|n+1.
Reflection steps affect strength measures in the minimal way allowed by this rule, namely
∀F ∈ Sn, |T+F|n = max(|T|n, ||F||n|T|n+1)
as ||F||n|T|n+1 is the next multiple of ω|T|n+1 after ||F||n.
Also, |Tn+11|n = |T|n|Tn+11|n+1.
In particular |Tnα+1|n = |Tnα|n + ω|T|n+1 which by induction gives

|Tnα|n = |T|n+ ω|T|n+1

This is related to Schmerl’s formula, which in our notations is written:
If |T|n+2=0 then for all ordinals α,
nωα(T) ⇔ ◇n((Tn1)n+1α)

which simplifies if α>0 as

nωα(T) ⇔ ◇n(Tn+1α)

These facts result from the following basic cause of dependency between reflection formulas in successive classes:

Remember that the strongest Πn+1 axiom of Tnα+1 is ◇nα(T).
By the formulation of ◇n+1 in terms of consistency, ◇n+1 is insensitive (in terms of truth) to additional true Πn+1 axioms, so that for all Πn+1 formulas F, we have the provable equivalence

(◇n+1(T)∧F) n+1(T+F)

In particular,

(◇n+1(T)∧◇nα(T)) ⇔ ◇n+1(T+◇nα(T)) ⇒ ◇nα+1(T).

Re-applying this deduction with α replaced by α+1 and so on, leads the theory Tnα+1+◇n+1(T) to successively prove ◇nα+k(T) for all (standard) finite values of k, i.e. to imply (each axiom of) the theory Tnα+ω.

From this basic fact (and the fact it is the only cause of dependency between values, which we shall simply admit here), the formula (copied here from above)

|Tnα|n = |T|n+ ω|T|n+1

(and the naturalness of conventions under which this holds) will be deduced by transfinite induction on the value β of |T|n+1 as follows.

While a presence of Πn+2 axioms, giving a nonzero β, can cause gaps in the progression |Tnα|n, the lack of these, expressed by β=0, gives |T+◇n(T)|n= |T|n+1, and thus by induction the case β=0 of the above formula, namely |Tnα|n = |T|n+ α.

To process induction on β, we need to let β take any value (not only multiples of something), by the following consideration.
The link between |T|n and |T|n+1 remains unchanged when replacing T by a theory with the same values of |T|n and |T|n+1 but the possibly different value |T|n+2=0, obtained by taking the set of all Πn+2 theorems of T in guise of set of axioms (this other theory still contains EA if T did, as further commented below). This way, denoting T'= T+◇n+1(T) we get |T'|n+1= β+1.

By the above, ◇n(T'+◇nα(T)) implies ◇n(Tnα+ω); while (we admit here) the converse holds when |T|n+2=0. Thus (provably),

n(T'+◇nα(T)) ⇔ ◇nα+ω(T).

From there, by induction on α where the case α=0 is due to (◇n+1(T) ⇒ ◇n(T)),

 ◇nα(T') ⇔ ◇nω.(1+α)(T)

Hence |T'nα|n = |Tnω.(1+α)|n = |T|n+ ωβ.ω.(1+α). = |T'|n+ ωβ+1
which completes the idea of the proof by induction on β.

These formulas leave the values of strength measures |T|0 and |T|1 under-determined, only determined by arbitrary conventional choices of values given to the weakest considered theory. Taking EA as the weakest theory (needed for things to work without burden), it naturally has |EA|n=0 for every n>1, and |EA|0= ω|EA|1, but |EA|1 rather deserves a finite but nonzero conventional value (a choice which affects the convention as |EA1α|1 = |EA|1 +α, and thus only affects measures of theories T with finite values of |T|1). The question of the most natural value to assign to |EA|1 may be a matter of dispute or arbitrary convention; the choice appearing in Wikipedia "ordinal analysis" is |EA|1 =3.

This may look strange given that the most convenient formalization of EA has Π1 axioms only. The trick is that Π2 axioms are equivalently expressible in the form of Π1 axioms over additional symbols of operations or functions, and the strength of EA comes from the strongest operation symbol in its language, which is exponentiation. There is a complicated possibility to re-express it in the mere theory I∆0 (bouded arithmetic with addition and multiplication), but no less, by a bounded formula Exp(x,y,z) which is true precisely if z = xy, so that its existence is expressed by the Π2 axiom (∀x,∀y,∃z, Exp(x,y,z)).

Then the formula ◇1(EA) is expressible as the existence of the tetration operation (or some other function or operation with equivalent speed, so to speak, such as the superexponential 2xy = 2...2y), that is the tower of exponents xa where 0a = 1 and ∀x, x+1a = axa.
Indeed, RFNΠ2(EA) gives:
∀x,(EA proves "∀a, xa exists") ⇒ (∀a, xa exists)
while EA indeed proves "∀a, xa exists" by successively using x times the existence of an exponent.
The fast-growing hierarchy is an ordinal-indexed sequence of functions, whose respective totality formulas (counted from that of the tetration) match the formulas ◇1α(EA).

A more ordinary way to prove the existence of the tetration, is to deduce it from the existence of exponentiation, using an induction axiom. Precisely, it involves Σ1-induction (the use of induction on Σ1 formulas). But this axiom is actually equivalent to ◇2(EA).
Generally for every n>0, IΣn ⇔ IΠn ⇔ ◇n+1(EA).

Thus, the Σn-induction axiom IΣn, is of class Πn+2, irreducible to any formula of lower class. One may be tempted to see it otherwise, by writing it as: for any Σn-formula F,
∀(parameters), ∀n, (F(0)∧¬F(n)) ⇒ ∃k<n, (F(k)∧¬F(k+1))
but, trying to reduce it via a collection axiom, one way would use BΠn which is equivalent to BΣn+1 and stronger than IΣn, while the other reduction attempt, using the weaker Σn-collection (implied by IΣn), actually gives another formula of class Πn+2, namely

∀n, (∀x, ¬F(0,x)) ∨ ∃x, (F(n,x) ∨ ∀b, ∃k<n, ∀y<b, (F(k,x)∧ ¬F(k+1, y)))
for every F of class Πn−1.

Around cut elimination

For most purposes, the concept of provability in first-order logic can be considered unique and unambiguous. Yet, under EA-provable equivalence, it has two versions, only provably equivalent in the theory EA+ defined as EA11, i.e. EA+◇1(EA) (totality of the tetration operation).
The difference is on whether to accept, among deduction rules, a kind of deduction called free-cut, that is a kind of use of the cut rule. While the cut elimination theorem says that the use of the cut rule can be eliminated in proofs of tautologies (theorems independent of any axioms) at the cost of an increase in size, a variant of this, called free-cut elimination, says that free-cuts can be eliminated from proofs of theorems using any given axioms. Actually, the use (vs. non-use) of free-cuts to prove a formula F assuming a finite list (or conjunction) X of axioms, is inter-translatable with the use of cuts to prove from scratch that X implies F.

To say in short, the job of a free-cut is to make use of an intermediate formula not inherited from the given axioms, and thus possibly outside the class they belong to. Indeed one can notice that the sketch of provability concept underlying the above account of the completeness theorem, thus sufficing to prove anything provable, does not use any formula with a higher class than those of given axioms.

Namely, a proof from axioms at most Πn, using free-cuts with formulas at most Πn+1, can be reduced to a proof with formulas at most Πn, but its size may grow exponentially through this reduction process. As this exponential growth of size occurs k times in the cut elimination process reducing the highest class of used intermediate formulas from Πn+k to Πn, it works in EA for each explicit value of k, and down to a free-cut free proof (assuming n to take an explicitly bounded value as well), but ◇1(EA) is required to justify its extension to the general case of free-cut elimination on proofs without explicit bound for k.

The choice of definition for provability between both options (with or without free-cuts) may affect the definition of ◇nα. Precisely, ◇nα(EA) is unambiguous (both versions are EA-provably equivalent) whenever n>0 or α is a limit ordinal, but is affected by the choice in the rest of cases, as follows: the consistency with free-cuts, of any T extending EA by finitely many Π1 axioms, is EA-provably equivalent to the free-cut free version of ◇01(T), i.e. when |T|1 = |EA|1, one ◇0 reflection step with free-cuts amounts to two steps without [A. Visser. Interpretability logic. In P.P. Petkov, editor, Mathematical Logic, pages 175–208. Plenum Press, New York, 1990.].

This may be intuitively understood in the following rough terms:

This rough view is also manifested by the following combination of facts [S. R. Buss, Handbook of proof theory, chap. II , 4.3] [Wilkie and Paris 1987]:

The completeness theorem for recursive theories (or only the elementarily definable ones ?) and the free-cut free version of provability, is found to work in the theory WKL0*, a version of second-order arithmetic accepting Weak König's Lemma as an axiom. The arithmetical consequences of WKL0* consist in the theory EA+BΣ1. Precisely, any countable model of EA+BΣ1 can be completed by a set P of subsets to form a model of WKL0*, though to construct of such a P requires an external perspective.
[Unifying the model theory of first-order and second-order arithmetic via WKL0*, Ali Enayat, Tin Lok Wong]
[Interpreting Weak König's Lemma using the Arithmetized Completeness Theorem, Tin Lok Wong]
Of course WKL0* is a very poor foundation for mathematics, as many basic theorems of algebra are equivalent over it to IΣ10, i.e. require the theory WKL0.
[Stephen G. Simpson and Rick L. Smith, Factorization of polynomials and Σ10 induction]

The axiom BΣ1 (unlike the stronger IΣ1 in the same class Π3) does not have Π2 consequences, and in particular cannot help to prove ◇1(EA).
[A conservation result concerning bounded theories and the collection axiom, S.R. Buss]

For the concern of the completeness theorem, the little difference made by the choice of version of provability is the following. A theory T may be consistent without free-cuts but become inconsistent with them. This means it really is consistent, letting unproblematic the existence of a model M of it, except only for the existence of a nonstandard number n in a model of EA+BΣ1 not fitting ◇1(EA), such that some Πn formula cannot be interpreted in M.

This is also unsurprising, since only PA (much stronger than EA+BΣ1) can ensure the interpretability of all Πn formulas in any given countable system for every explicit value of n (for the point of the ability to rule out any inconsistency involving such formulas, as conflicting an accepted induction axiom), and still higher strength is needed to extend this to unspecified n, as we shall hint later.

Transfinite induction schemes

The equivalence between induction and reflection, can be generalized as an equivalence between transfinite induction and iterated reflection as follows (extrapolating the information from [Transfinite induction within Peano arithmetic, Richard Sommer]).

Denote TI(β, Γ) the schema of transfinite induction along a given ordinal β for formulas of a given class Γ (that we shall take as Σn or Πn). It generalizes ordinary induction as IΓ ⇔ TI(ω, Γ), and is also reducible to a single formula using a truth formula.

For all n∈ℕ and all ordinals α, β such that ωωα≤β<ωωα+1 (and there is more precisely a standard k∈ℕ such that β<ωωα.k),

n+2α(EA) ⇔ TI(β, Πn+1) ⇔ TI(ω.β, Σn)

As ω.ωωα = ω1+ωα, while 1+ωα = (2 if α=0, and ωα otherwise), TI(ω.β, Σn) can be simplified as TI(β, Σn) except that when α=0, the condition for TI(β, Σn) to keep equivalence with other above formulas becomes ω2≤β<ωω.
On the other hand,

ω≤β<ω2 ⇒ (TI(β, Σn) ⇔ IΠn)

Also remember that the theory (EA+◇n+2α(EA)) = EAn+2α+1 satisfies
|EAn+2α+1|n+2 = α+1
|EAn+2α+1|n+1 = ωα+1
|EAn+2α+1|n = ωωα+1
thus can prove ◇nβ(EA) (that is β iterated Πn+1 or Σn reflection) for precisely all β<ωωα+1. These are the same β for which it can prove TI(β, Πn+1), and the same for TI(β, Σn). But this is somehow coincidental, in lack of direct implication either way (◇nβ(EA) is only deducible from TI(β, Πn+1) when β≥ω, and from TI(β, Σn) when β≥ω2).

Yet underlying ideas are that

Hyperations and the Veblen hierarchy

Let us introduce concepts from [Hyperations, Veblen progressions and transfinite iteration of ordinal functions, David Fernández-Duque, Joost J. Joosten, 2013] following a progressive introduction path I did not see elsewhere.

Consider the operation ωβ.α between ordinals α and β, as a β-indexed family of functions with argument α. Let us also consider a slight variant (ϕβ), defined from it by

1+ϕβ(α) = ωβ.(1+α)

so that ϕ1 enumerates the class of all limit ordinals.

Both families have the following similar properties.

Each family is the hyperation (hyper-iteration) of its 1-member ϕ=ϕ1, being determined by it as follows.

First, it is a weak hyperation, i.e.

But, it is not continuous: whenever β is a limit while α is a successor,

ϕβ(α) > limγ→β ϕγ(α)

This should be expected by lack of other formula for the progression of ϕβ(α) for a fixed α>0 and a variable β≥ω: right additivity (unlike left additivity) does not express ϕβ+1(α) using ϕβ(α).

Instead of this, the basic definition of ϕβ(α) for limit β, proceeds by first defining ωβ = limγ→β ωγ, namely

ϕβ(0) = limγ→β ϕγ(0)    (L0)

Then, comes its multiplication by 1+α. This can be understood as a recursion which adds ωβ at each step of incrementing α (while cases of limit α follow by continuity). These steps are themselves expressible as limits:

ϕβ(α+1) = ϕβ(α)+ωβ = limγ→ββ(α)+ωγ)

Defining ẟ by β=γ+ẟ, also written ẟ=-γ+β,

ϕβ(α+1) = limγ→βγ(α))+ωγ) = limγ→βγ(α)+1)).

Notice that strict increase and right additivity suffice to ensure

ϕβ(α) < ϕγ(α)+1) ≤ ϕγ(α+1)) = ϕβ(α+1)

Now, the hyperationβ) of ϕ=ϕ1, is the minimal weak hyperations of ϕ.

Equivalently (as justified below), it is a weak hyperation where ϕβ is defined for limit β as the continuous function fitting (L0) and

ϕβ(α+1) = limγ→β ϕγ-γ+β(α)+1)  (L)

Looking back at ϕα+β = ϕα⚬ϕβ, when β and (equivalently) α+β are limits, the instance of (L) for ϕα+β is deducible from the one for ϕβ. Thus, for the sake of defining hyperations by transfinite recursion from ϕ, it only adds information when stated for limit values which are additively irreducible, i.e. not expressible as a sum of two strictly lower ordinals. Such ordinals are those of the form ωβ for some β>0.

For these, right additivity gives
∀γ<β, ϕωγ⚬ϕωβ = ϕωβ
which means all values of ϕωβ are fixed points of all ϕωγ for γ<β (and thus also fixed by all composites of these functions, which are all ϕξ for ξ<ωβ written in Cantor normal form).
So, the minimality condition for (ϕβ) among weak hyperations of ϕ, is that each ϕωβ is surjective to the class of such fixed points (it is not skipping any of them). Thus, the family (ϕωβ) coincides with the Veblen progression based on ϕ, written (ϕβ) and defined by

We can see its equivalence with (L0)∧(L): for ϕβ = ϕωβ, (L) simplifies as

ϕβ(α+1) = limγωβ ϕγβ(α)+1)

In particular, the identity ωβ+1 = limnω ωβ.n gives

ϕβ+1(α+1) = limnω ϕωβ.nβ+1(α)+1)

where by right additivity,

ϕωβ.n = ϕβ⚬...⚬ϕβ

which means ϕβ+1(α+1) is the next fixed point of ϕβ after ϕβ+1(α).
Likewise, ϕβ+1(0) = limnω ϕωβ.n(0).

While we introduced the concepts of hyperation and Veblen progression on cases which admit an explicit expression, they become more useful when no such expression is available. There is one main such use case, once again with two slightly different versions, respectively based on slightly different functions e and φ:

φ(ξ) = ωξ = 1+e(ξ)
e(ξ) = ϕξ(0)
Their respective hyperations are related by
eα(0) = 0
eα(1+ξ) = (φα(ξ) if α is a limit; φα(1+ξ) otherwise).
In particular, φα(ξ)=eα(1+ξ) if α>0.

The epsilon numbers (sequence of ordinals) enumerate the fixed points of φ:
εα = φ1(α) = φω(α)

More or less regular theories

Among subsystems of first-order arithmetic, we saw a diversity of possible theories whose classification can be approached by different ordinal strength measures; and even these measures together cannot perceive some differences, such as the option to add the collection axiom BΣn+1 when IΣn is postulated while IΣn+1 isn't.

For many purposes one may prefer to focus on theories whose proving power can be characterized by a minimal amount of data, especially their ordinal strength measures as defined above, giving them clear positions in reasonable pictures of the strength hierarchy. Such theories will be called regular. But diverse candidate criteria of regularity may be considered, and these may disagree (be not totally ordered by provable implication) about relatively less regular theories.

For theories of first-order arithmetic (seen as provability predicates on the class of first-order arithmetical formulas), a first notable candidate, is the condition of being equivalent to a theory T axiomatized by iterated reflection, i.e. T=EA+{◇nβ(EA)| n∈ℕ ∧ β<αn} for a given sequence of ordinals (αn)n∈ℕ. Let us call this basic regularity. As previously explained, this range of theories is fully classified by the sequence of ordinal measures |T|n, which is the smallest sequence of ordinals such that |T|n = αn if αn is a multiple of ω|T|n+1, or otherwise the next such multiple (αn + ω|T|n+1).

Regularity conditions on T will be expressed by a sequence of ordinals presented in either form (µn) or (νn) as follows:

|T|n = ω|T|n+1 . µn = ϕ|T|n+1n)
µn = (νn if |T|n+1=0; 1+νn otherwise).

They allow to re-express the growth of strengths by iterated reflection (|Tnα|n = |T|n+ ω|T|n+1.α) as

|Tnα|n = ω|T|n+1.(µn+α) = ϕ|T|n+1n+α)

A basically regular theory fails to containing PA when for some m, |T|m+1 =0, equivalently (∀n>m, αn=0). Picking here the smallest such m, |T|m = µm = νm = αm >0, T can be expressed as: if m>1,
T = ((...(EAmνm)...)1ν1)0ν0
(or some other way if m=1).
In particular for m>1 the sub-theory EA+IΣm−1= EAm1 of PA has strengths
nm, |EAm1|n = φmn(1)

Yet one may still like to call "regular" a theory obtained from this by adding the next arithmetical collection axiom (BΣm), as we mentioned the minimal foundation for the completeness theorem (WKL0*) is such a theory.

A more selective regularity condition for such theories, consists in requesting all multiplicity coefficients µn to be finite.
The motivation for this, is to consider any acceptance of an infinitely iterated reflection, giving a new theory Tnω out of an old one T, as deserving to come with that of the formula ◇n+1(T) of its justification principle : Tnω ⊂ T+◇n+1(T) = Tn+11.
Similarly for its soundness formula : ◇nω(T) better comes as a consequence of ◇n(Tn+11).
This regularity has the following consequences:
µn<ω ⇒ |T|n+1 = ω|T|n+1n +1 < ω|T|n+1.ω = φ(|T|n+1+1)
∀p>0, (∀k<p, µn+k <ω) ⇒ |T|n < φk(|T|n+p+1)

A more standard and still more selective regularity concept is defined relatively to a given degree: a theory T is said to be Πn+1-regular, if it has the same Πn+1 theorems as EAnα for a certain ordinal α; equivalently, ◇n(T)nα(EA).

Basic regularity, introduced above, implies Π1-regularity.

A possible defect of this definition of Πn+1-regularity for n>0 is its dependence on EA as a choice of initial theory. This defect is avoided by the following alternative formulations

∀k<n, µk=1
∀k<n, νk=0

Assuming basic regularity, all 3 definitions are equivalent, denoting α=|T|n, in either case:

But, (|T|n=0 ∧ ∀k<n, νk=0) would imply |T|0=0 which is impossible for a theory expressing any arithmetic at all.
The formula µk=1 simplifies |Tkα|k as |T|k.(1+α). Moreover,

(∀k<p, µn+k=1) ⇒ |T|n = φp(|T|n+p)
(∀k<p, νn+k=0) ⇒ |T|n = ep(|T|n+p)

In particular, among basically regular theories T with |T|n>0 where n>0, the Πn+1-regularity condition is equivalent to |T|0 = en(|T|n).
If n>1, then EA+IΠn−1 = EAn1 is the weakest Πn+1-regular theory. In the exploration of the strength hierarchy of theories, it often happens to find a theory T which is somehow canonical, in that it is the weakest theory satisfying a certain regularity condition. In this case, it is generally natural for the study of theories stronger than T, to focus on those also fitting that same regularity condition for which T is weakest.

The theory PA, essentially axiomatized by the schema of induction

PA = EA+{◇n(EA)| n∈ℕ}

has all its strengths equal: ∀n∈ℕ, |PA|n = limpω φp(1) = ε0

It is the weakest theory satisfying all Πn+1-regularity conditions (∀n, νn=0 ∧ µn=1).

Any theory with (∀n, µn<ω) and not containing PA, namely not containing IΣm for some m, is contained in it:
|T|m+1 = 0 ⇒ |T|0 < φm+1(1) < φ0ω(0) = ε0

The whole topic of ordinal analysis started with a proof by Gentzen that

 TI(ε0, Σ0) ⇒ ◇0(PA)

and that ε0 is minimal for this property because for each given α<ε0, TI(α, Σ0) is a theorem of PA.

Worm calculus

A finite composite of reflections ◇n of different degrees n, is called a worm. For example
ijk(T) :⇔ ◇i(T+◇j(T+◇k(T))).
Yet successive reflective expansions of theories relate to worms in complicated manners: ◇i((Tk1)j1) develops as ◇i(T+◇k(T)+◇j(T+◇k(T))), so it is equivalent to ◇ijk(T) except if j<k, in which case it is equivalent to ◇ikjk(T) instead. Further successions of reflective expansions also develop as worms with similar splits into multiple cases.

When j<k, ◇ijk is not equivalent to any finite composite of reflective expansions, but to a transfinite one instead:
j(T+◇k(T)) ⇔ ◇jφ0k−j(1)(T)
ijk(T) ⇔ ◇i(Tjφ0k−j(1)+1)

Some worms are even only equivalent to combinations of transfinite reflective expansions with different degrees. For example
1 2 0 3(T) ⇔ ◇1((T0φ03(1))1ω) ⇔ (◇1ω(T)∧◇0φ03(1)+ωω(T))

Conversely, let us explain how any transfinitely iterated reflection ◇nα(EA) along some α < ε0, is equivalent to a worm, all of whose entries (degrees) are ≥n.

Reflections iterated along ordinal sums are reducible to composites as follows:

Tnα+β = (Tnα)nβ

For all theories T, T' and all ordinals β,
(◇n(Tnβ) n(T')) ⇒ Tnβ+1 = T+◇n(T')
Using Schmerl’s formula, if |T|n+2=0 then for any ordinals α, β,

n(Tnβ+ωα) ⇔ ◇n((Tnβ)nωα) ⇔ ◇n((Tnβ+1)n+1α)

where the boldface denotes provable equivalence.

The following reasoning was based on a mistake and I am not sure how to fix it:
The implicit use of the concept of axiom schema in formulas of the form ◇n(Tnα) when α is a limit, can be reduced by writing a limit ordinal in the form β+ωα, then applying the above reduction to it.
The same goes for ◇n(Tn+1α) (and so on) since it goes for ◇n+1(Tn+1α) and
(◇n+1(T) n+1(T')) ⇒ (◇n(T) ⇔ ◇n(T')).

If the new α and/or β is still a limit, further reduction proceeds in the same way, and this must end after a finite number of such reductions under the hypothesis α<ε0.

Slow reflection

There are possible variants for the concept of regularity, where (∀k<n, µk=1) is "morally" kept while basic regularity is dropped, in the sense that for some k, |T|k = φ0(|T|k+1) "morally" holds, but |T|k may still take intermediate values between φ0(α) and φ0(α+1) for some ordinal α. This means |T|k+1 need not always be an ordinal, but may take intermediate values in between consecutive ordinals.

For (k=0, n=1), this means ◇1(T) expresses the existence (i.e. totality) of functions with intermediate speed between those expressed by ◇1(U) and ◇1(V) where U,V are theories such that |U|1<|V|1 are consecutive genuine ordinals; such a T is obtained by adding to U an axiom of existence of a function with intermediate speed between the fastest ones expressed inside U, and one expressed by ◇1(U). Examples functions with intermediate growth rates are given by slow-growing hierarchies (which may give widely different ordinal numbers of intermediates between given speeds, depending on conventions for fundamental sequences), but I am not sure if they fit the present purpose.

Anyway, such a function has been especially used by [Samuel R. Buss, Handbook of proof theory, ch. II, 1.3] to define his theory S2 in between I∆0 and EA, though it also involved modified versions of the induction axioms.
The idea is to simulate a distinction between a range of "mere numbers" x, only allowed to multiply, and the wider range of their exponentials 2x serving as ranges of sets p<2x, by introducing an operation # between the latter where 2x.y = 2x#2y. This operation # having intermediate strength between multiplication and exponentiation, is so used to let "mere numbers" be freely multiplied, but exponentiated only once.

In similar ways, functions can be defined with speeds in between other  consecutive functions of the fast growing hierarchy, to provide intermediates between other theories, such as between EA and EA+.

In a similar vein (or not ?), a concept of "slow consistency" has been found, so that "Transfinite iterations of slow consistency generate a hierarchy of precisely ε0 stages between PA and PA + Con(PA)" [Slow Reflection, Anton Freund].
(This number matches the gap from |PA|0 = ε0 to |PA+◇0(PA)|0 = ε0.2)
We shall not use such concepts in the rest of this exposition.

ω-strength

For any basically regular arithmetical theory T, Let us analyze the possible limit behavior of its sequence of strengths |T|n for large n.
Remember that

n∈ℕ, |T|n+1 ≤ |T|n = ϕ|T|n+1n)

with equality when νn = 0 ∧ |T|n = e(|T|n).
Indeed, νn = 0 ⇒ |T|n= e(|T|n+1), and by injectivity of e,
∀ξ, ξ=e(ξ) ⇔ e(ξ)=e(e(ξ))

As a decreasing sequence of ordinals, it must reach a minimum for some n∈ℕ:

k>n, |T|k=|T|n

As a fixed point of e, this minimum must be of the form e1(ξ) = eω(ξ) for some ordinal ξ called the ω-strength of T and denoted |T|ω:

kn, |T|k = eω(|T|ω)

This extends the validity of the previous formula
(∀k<p, νn+k=0) ⇒ |T|n = ep(|T|n+p)
from finite values of p to the infinite value p=ω.

Let us check the condition for a kind of converse: can there be |T|n = φ(|T|n) but not (∀k>n, |T|k=|T|n) ?
This would mean for some n and some ξ<|T|n,
|T|n = ω|T|n = ωξ . µn
where µn>1 is additively indecomposable because |T|n is. Thus µn = ωα for some α>0, and
ωξ+α = ω|T|n = |T|n = ξ+α = α = ωα = µn
This can be excluded by adopting the regularity condition
k, µk0
Like previously introduced strength measures of finite degrees, the ω-strength measure of arithmetical theories matches the count of steps of expansion by some kind of iterated reflection, now by full arithmetical soundness:

Tωα := T+{◇n(Tωβ) | n<ω, β<α}
|Tωα|ω = |T|ω

So, the expression of the soundness of T with respect to all arithmetical formulas, which we might abusively denote like a formula ◇ω(T), is only an axiom schema {◇n(T) | n<ω}, irreducible to any single formula in the language of PA. The logical system formed by all modalities ◇n (n<ω) and ◇ω, has been described in [Positive provability logic for uniform reflection principles, Lev Beklemishev, arXiv:1304.4396].

The study of arithmetical theories T at least the strength of PA, i.e. with |T|ω > 0, may be naturally restricted to those fitting all Πn+1-regularity conditions, namely the T = EAωα for any α>0, with ω-strength |T|ω=α.
The strength measures of such theories at all finite degrees coincide, making their distinction superfluous :

n<ω, |EAωα|n = eω(α)

From arithmetic to hyperarithmetic

To continue our exploration of theories with strength more than little after PA, the next step involves expanding the language beyond that of first-order arithmetic. Namely, somehow in between first-order and second-order arithmetic, comes the language of hyperarithmetic, though it is currently not officially recognized as a logical language in its own right.

Compared to first-order arithmetic, the point of hyperarithmetic is to generalize the concepts of Πn and Σn classes of formulas, expansions Tnα of theories T by iterated reflection and their strength measures |T|n, from finite values of the degree n to recursive ordinal values.

The difficulty is that the language of first-order arithmetic, by its reliance on the framework of first-order logic, only natively admits formulas from classes Πn and Σn for finite n, using n successive unbounded quantifier symbols followed by some bounded formula, but has no direct concept of Πα or Σα formulas when α is an infinite recursive ordinal.

This may lead to express hyperarithmetic by leaving behind the concept of classes of formulas, in favor of those of Turing degrees and Turing jump, which may be considered more appropriate.


Now the concern of measuring the strength of theories, thus their proving power, induces a need to formalize hyperarithmetic into first-order logic, which is the natural reference of a framework with a clear and well-behaved provability concept (where the completeness theorem works). Such a formalization is offered in
[Reflection algebras and conservation results for theories of iterated truth, Lev D. Beklemishev, Fedor N. Pakhomov]
[Conservativity spectra and generalized Ignatiev model, Lev D. Beklemishev]


(to be continued)

ω+1(T) ⇔ (CT+I∆0(Tr))
[Model Theory and Proof Theory of the Global Reflection Principle, Mateusz Łełyk]

Transfinite worm calculus:
([The Worm Calculus, Ana de Almeida Borges & Joost J. Joosten])
[Well-orders in the transfinite Japaridze algebra, David Fernández-Duque, arXiv:1212.3468]

Sorted references

For mention in introduction - potentialism

Sets and supersets
    July 2015Synthese 193(6):1-33
DOI:10.1007/s11229-015-0818-x
Authors:
Toby Meadows
https://aura.abdn.ac.uk/handle/2164/8726
https://www.academia.edu/81998125/Sets_and_supersets?f_ri=821

Acceptable ideas from the list mentioned there: literalism, generality relativism, modal approaches, property approaches, Zermelo’s “meta-set theory”, Grothendieck universes.


Joel David Hamkins & Øystein Linnebo, The modal logic of set-theoretic potentialism and the potentialist maximality principles
https://arxiv.org/abs/1708.01644
http://jdh.hamkins.org/set-theoretic-potentialism/

Why Is the Universe of Sets Not a Set? Zeynep Soysal

Set-theoretic potentialism, Joel David Hamkins (slides)

Set-theoretic foundations, Penelope Maddy

Potentiality and indeterminacy in mathematics Øystein Linnebo University of Oslo oystein.linnebo@ifikk.uio.no Stewart Shapiro The Ohio State University shapiro.4@osu.edu


The Potential Hierarchy of Sets Øystein Linnebo

Context relativism


Everything, more or less, James Studd
https://jamesstudd.net/category/research/
"One way to address this challenge is to deploy a sophisticated relativist-friendly account of quantifier domain restriction due to Michael Glanzberg (2006). In his view, quantifiers exhibit two sorts of context sensitivity: the context supplies both a background domain associated with the determiner and a local contextual restriction attaching to the nominal. On this view, we are free to operate in contexts where no non-vacuous local restriction comes into play. Such locally unrestricted quantifiers may then range over the entire background domain. Nonetheless, Glanzberg does not accept that such locally unrestricted quantification achieves absolute generality. Suppose we begin with a background domain D⋆ . Glanzberg develops a pragmatic account of how we may exploit Russell’s paradox to shift to a wider background domain containing D⋆ ’s Russell set. This provides one way to make sense of the performative aspect of the Russell Reductio noted in Section 1.4."


Context and Unrestricted Quantification, Michael Glanzberg, in Absolute Generality, ed. A. Rayo and G. Uzquiano, Oxford University Press, 2006, pp. 45-74.
https://michaelglanzberg.org/home/research/


Hybrid-relativism and revenge, James Studd

Recursion theory

Foundations of Mathematics Stephen G. Simpson October 1, 2009

Without power set


What is the theory ZFC without power set?
Victoria Gitman, Joel David Hamkins, Thomas A. Johnstone
https://arxiv.org/abs/1110.2430

CH


Is the Dream Solution of the Continuum Hypothesis Attainable? Joel David Hamkins, Notre Dame Journal of Formal Logic Volume 56, Number 1, 2015


Can we resolve the Continuum Hypothesis? Shivaram Lingamneni December 7, 2017
https://philpapers.org/rec/LINCWR-3 https://cs.stanford.edu/people/slingamn/

RECENT PROGRESS ON THE CONTINUUM HYPOTHESIS (AFTER WOODIN) PATRICK DEHORNOY

ZF



Separating Hierarchy and Replacement, Randall Holmes


Multiverse

A multiverse perspective on the axiom of constructiblity, Joel David Hamkins

Gödel's program, John R. Steel, 2012

A reconstruction of Steel's multiverse project, Penelope Maddy & Toby Meadows


THE SET-THEORETIC MULTIVERSE JOEL DAVID HAMKINS
Association for Symbolic Logic, 2012 doi:10.1017/S1755020311000359

The Set-theoretic Multiverse : A Natural Context for Set Theory
https://philpapers.org/rec/DAVTSM-8
Annals of the Japan Association for Philosophy of Science Vol.19(2011)

Generically invariant set theory John R. Steel July 21, 2022

Large cardinals


https://en.wikipedia.org/wiki/Jensen%27s_covering_theorem
== Ordinal definability and the structure of large cardinals Gabriel Goldberg

https://en.wikipedia.org/wiki/Second-order_arithmetic#Projective_determinacy
ZFC + {there are n Woodin cardinals: n is a natural number} is conservative over Z2 with projective determinacy, that is a statement in the language of second-order arithmetic is provable in Z2 with projective determinacy iff its translation into the language of set theory is provable in ZFC + {there are n Woodin cardinals: n∈N}.
https://en.wikipedia.org/wiki/Axiom_of_projective_determinacy
PD follows from certain large cardinal axioms, such as the existence of infinitely many Woodin cardinals.

https://plato.stanford.edu/entries/large-cardinals-determinacy/


FOUNDATIONAL IMPLICATIONS OF THE INNER MODEL HYPOTHESIS, TATIANA ARRIGONI∗ AND SY-DAVID FRIEDMAN

Large Cardinals and the Iterative Conception of Set Neil Barton∗ 10 April 2019†
(a) Projective Determinacy (schematically rendered). (b) For every n < ω, there is a fine-structural, countably iterable inner model M such that M |= “There are n Woodin cardinals”.
= Are Large Cardinal Axioms Restrictive? Neil Barton∗ 24 June 2020†