Structures of mathematical ontology from relativism to growing block time

Abstract :
Some of the most philosophically relevant facts of mathematical logic are gathered and popularized, sometimes humorously, into a coherent philosophical vision of an open-ended mathematical reality: from the relativity of semantics, through the Lowenheim-Skolem and Completeness theorems, to the systems of second order arithmetic and set theory with their ontological justifications. Some seemingly different theses, including relativist and potentialist concepts, can be read as different aspects, approximations or presentations of a common general understanding. Ordinal analysis describes the growing block time structure of mathematical ontology, while power sets admit a relativist interpretation.

This article is still incomplete and under development.

Introduction

The literature on mathematical ontology (I mean, the ontology of mathematical entities) appears filled with a diversity of competing views. Letting aside non-realistic views which essentially deny the very existence of this debate topic, diverse options of mathematical realism appeared on the market.

A first approach to mathematical ontology is given by the famous iterative conception of set. It can be praised as a good introduction and approximation indeed, but has shortcomings : roughly speaking, it works well for hereditarily finite sets, but becomes unsatisfactory in details to account for the existence of infinite sets, whose delimitation (from any range of objects whose existence is admitted at some "time") beg for definitions; this default view amounts to admitting the power set axiom, which is questionable.

Hence the need of more specific or elaborate views. Some views, sticking to this conception regardless, are versions of set theoretical realism, also called universism, meaning there would be a single correct ideal universe (i.e. model of set theory) ; this leaves open the question of its exact properties in the endless range of statements left undecidable by our officially accepted axioms.

Other views recognize a range of multiple legitimate universes, forming a multiverse, across which the facts may be or not be expressed in modal formalism ; but multiple versions of such a multiverse have been proposed, where universes may vary by height, by width or both, and may be strongly or weakly linked together.

The main account of height potentialism I could find was [Linnebo, Ø. (2013). The potential hierarchy of sets.], which has the merit of proposing a potentialist justification of the axiom schema of replacement; but this exposition remains unsatisfactory, as pointed out by ["Why Is the Universe of Sets Not a Set?", Zeynep Soysal] (which unfortunately ends up throwing the baby with the bathwater). Without precisely analyzing their presentation and arguments, we shall offer a justification of this schema along roughly the same idea, but with different details. Other accounts include (...)

Such a state of profusion of competing thesis with no clear tools or reasonable hope to decide which one is "true" may be usual in philosophy, but is much less common in hard sciences. At least, most scientific controversies only last for a while before getting resolved by clear evidence (while mathematical conjectures, which may wait for long without solutions, still have clear and widely agreed criteria of evidence, so that any resolution remains a matter of fact, namely of providing such evidence, rather than a matter of taste). Persisting controversies of a philosophical kind remain an anomaly in science. As explained in my previous article, the other famous case of this kind, that is the measurement problem in quantum mechanics, should not have occurred in this way since a clearly best solution is actually there and is only kept ignored by a kind of sociological accident.

The present article will argue for a similar conclusion in the case of mathematical ontology, but for a very different reason : that behind visible discrepancies, the main candidates should rather be seen as genuine aspects of a single necessary, unified view (as happened in physics, most famously on whether light is a wave or a particle). This follows the famous allegory of the blind men and an elephant, yet without necessary blindness of all participants, as it may be a kind of joke or light writing style for some authors to offer seemingly different views and present them as diverging from each other, especially when it is seen as normal for diverse such views to be developed by the same authors. Anyway, we shall care to clarify the means of unifying diverse perspectives into a coherent general understanding.

The precise range of the main valid views (aspects of the big picture) may not be perfectly reflected yet by the list of most often mentioned views in the current literature on this topic; but, beyond the lack of strict division to be made between right and wrong foundational theories or philosophies of mathematical ontology, the further away a theory or view departs from the unified understanding, the more objectively defective it is likely to be found as a candidate foundation for mathematics. Of course, if anyone expected the ultimate picture of mathematical ontology to be a static one expressed by a fixed formal system, then the question of specifying such a system would admit no right ultimate answer, because the correct view is rather a dynamical one escaping any fixed formalization.

We shall explain the features of its dynamism by reviewing the mathematical facts, with the deep articulations they provide between superficially divergent but valid perspectives. Beyond the iterative conception of set, a more robust version of the role of ordinals to map the ontological order of mathematical infinities, is given by ordinal analysis (measure of the strength of theories), which takes some work to introduce (familiarity with ordinals, including the Cantor normal form, will be presumed in this exposition). We shall see that "width potentialism" and "height potentialism", describing the expansion of the set theoretical universe, naturally go together, with no need to formally postulate this as if it was an arbitrary assumption; the proper view of the width dynamism transcends the one operated by forcing, which is a relatively limited move. Comparing formal theories, an important focus will be the unity of ontological meaning between set theory and second-order arithmetic, as both can be strengthened by similar large cardinal axioms behind the surface of their huge literal divergence.

Let us first zoom out by splitting the relevant general topics into the following list, ordered from the least to the most mathematical ones:

(a) General ontology;
(b) The precise status of mathematical ontology relatively to general ontology;
(c) The internal structures of mathematical ontology
(d) Purely mathematical works and results from relevant formal theories of mathematical foundations.

(a) was the focus of my previous article with detailed description of the nature of conscious time, taking inspiration from the time structure of mathematical ontology.

(b) would be rigorously unnecessary to address (c), except that some philosophers may be hindered in their approach to (c) by unfortunate preconceptions or unresolved issues about (b). A short reply to this possible obstacle could be to refer to the view of William Tait, which may fit as a first approximation, despite its shortcomings. But to be much more precise in some ways, the first section here will offer without argument some possibly original answers to this topic (b) which seemed to make some philosophers busy, in complement to my previous work on (a). These may actually be of some use as a preliminary for (c), by altogether justifying the relevance of some kinds of metaphors which will be used in the expression of (c), and including implicit disclaimers against over-interpretations of these.

Then, the rest of sections will focus on (c), largely based on an extensive review of diverse facts of mathematical logic (d) which are essentially already well known to specialists, though many of these are often neglected by the spotlight of usual courses and philosophical accounts. The general method will be to let answers to (c) naturally emerge by intuition when "reading in between the lines" of relevant facts of (d) and their many connections. It will thus mainly be a work of popularization of mathematical logic, gathering and describing the necessary facts in a coherent philosophical order, forming a clear, dynamical and somewhat encompassing picture of mathematical ontology. This being intended as a work of popularization of mathematics rather than one of strictly mathematical research or lecture notes (and as I may lack technical expertise in some reviewed concepts), it will skip many of the technical details and proofs which usually fill other works on the topics, but which are not crucial to the philosophical understanding ; and unfortunately cannot claim full accuracy or completeness on all possibly relevant details at this stage. Thus, the point is rather to serve as a pioneer way to sketch the big picture of this vast topic; and I welcome any feedback for correction in case of any inaccuracy.

The status of mathematical ontology

The thesis of general ontology expressed and defended in the previous article can be summed up by these claims:

There are two primitive substances: mind and mathematics ;
The ontology of each substance is structured as a growing block time order ;
Their respective time flows work similarly but separately from each other ;
Our physical reality emerges from a combination of both, namely as a conscious actualization flow of mathematical structures;
This combination of both substances which forms the physical universe, has the same nature as the epistemological link between them (the activity of exploration of mathematical realities by a mathematician's mind).

Let us now address the following question which was left unanswered there : how can the possibility of such links between both substances, be theoretically reconciled with the claim that they are indeed two distinct substances, with effectively separate ontologies (concepts and flows of existence) ?

This question, and the usual lack of explicit answer to it, is often presented by philosophers as an objection against mathematical Platonism. While it may be dismissed as a pseudo-problem, let us be somewhat explicit in answering it, accounting for how both substances effectively differ, and still remain connected in these familiar ways. The perception of this question as a problem and even as a possible objection against mathematical Platonism, as well as against substance dualism, may have taken strength from physicalist preconceptions, namely the temptation of conceiving any two realities or substances according to the idea of two physical objects coexisting in a common physical space, and which would interact inside it by a kind of physical interaction. One big defect of this physical interaction model, is its failure to account for the asymmetry of the link: the transcendence of consciousness over mathematics, by which the mind can understand mathematics but mathematics cannot describe the mind.

Dismissing the physical interaction model dispenses us of searching for an interaction process in guise of account for the link, but opens the question of offering a better conception instead. It is likely impossible to effectively express this link in perfectly accurate terms, for precise reasons coming from the nature of the topic as will be explained below; but we shall offer candidate approximations of how it works.

Let us start by a topological metaphor. The idea is to figure the mind as an open set in a topological space, and to figure the mathematical stuff as the boundary of this set. This metaphor succeeds in combining the following features which might have been felt as incompatible, therefore illustrating that such a feeling of incompatibility was unmotivated:

The emptiness of the intersection
The ability of the mind to grasp (approach) mathematics
The impossibility for mathematics to grasp the mind.

It also interestingly reflects further features of the link :

That mind has depth, while math is thin ; in particular, that it is somehow slow and heavy in stakes to bring some conscious event into existence, while there is no cost assuming the mathematical existence of any possible (consistent) mathematical system;
That mathematical stuff comes as a limit case of possible conscious stuff (namely, obtained by somehow depriving conscious stuff of its depth, namely ignoring its qualia). This conception of mathematical stuff as a limit case of possible conscious stuff, reflects how mathematics can enjoy features which the mind does not actually possess but can only approach, namely the possible admission of actual infinity in mathematics, in contrast with the lack of actual mathematical infinity in consciousness as far as we are aware of.

One may even refer to a property of measure theory as a metaphor of how these two last features happen to be combined: that an infinite union of sets with measure zero also has measure zero. So, since the existence of any finite mathematical set has no cost for consciousness, an "action" of creating an infinite set, as a limit case of creations of finite mathematical sets, is also costless, at least for a concept of "cost" which pertains to consciousness.

For more accuracy we need to correct the above claim of empty intersection, which has an important exception in a sense: the mind can also be said to actually contain finite mathematical systems (some of these in actuality, the rest in potentiality); it is just not reducible to these (the mathematical concept of cardinality is inapplicable to conscious stuff considered in its depth rather than for some specific mathematical limit aspect). Moreover, to say that the mind contains finite mathematical systems is just a way of speaking, since these mathematical systems are only aspects, limits of what is in the mind, and not rigorously of the same nature as any objects actually in the mind (the intersection of their natures must indeed be empty since mathematical objects already have empty nature, by definition of what is a mathematical object).

This exception is actually crucial, since among finite mathematical systems there is the crucial case of formal systems, that is the syntactic side of the dipole (syntax/semantics) of mathematical ontology: formal systems constitute theories which can describe any other mathematical systems (or at least those we can study by our somehow finite means, namely excluding the use of any "syntax" with infinitary operations or formulas with infinite size). Now, the mathematicians abilities to explore mathematics are largely reflected by the mathematical features of finitary formal systems, whose rules and dynamics (of formal definitions and proofs) are a reduced form or subsystem of these abilities, playing similar descriptive roles.

In such cases of similarity between features of conscious and mathematical ontologies, the mathematical version is a limit case of the conscious version, namely where the indiscible depth of qualia is removed or ignored; let us also call it the zombification of a concept or feature of ontology. This zombification may be a trap or a tool, depending on how it is handled.

Authors trying to make some physical or mathematical theories of the mind under some currently more widespread metaphysical preconception, will probably try to offer some sketch of descriptions which aim to become as formally precise as possible; but regardless to which degree they themselves happen to make their descriptions precise, the attitude of raising some kind of formal accuracy as the target horizon of their research actually implies the ultimate zombification of its object, i.e. what they will effectively describe cannot be minds anymore, no matter whether they like to admit it or not. The trap, then, would be to ignore this substantial difference, with the illusion that they are describing minds when they actually aren't.

On the other hand, the proper use of zombification as a positive tool of ontological research, as I see it, proceeds as follows. It consists in developing mathematical concepts, or concepts about mathematics, somewhat similar to some aspects of the mind, in a way open to intuitions from aspects of consciousness beyond the resources of given formal means, but keeping the focus on trying this way to develop understandings of mathematics with its own ontology, instead of attempting to specifically represent and analyze consciousness. The reason is the coexistence of both similarities and dissimilarities between substances, which needs to be accepted; and that the discovery of hidden similarities can eventually only come after the acceptance of dissimilarities, rather than against these. As an unavoidable effect of any detailed rational analysis, zombification needs to be accepted and recognized rather than denied or forcefully avoided, since attempts to avoid it by trying to force similarity by some mathematical postulation of features seemingly similar to aspects of the mind would be vain. Pushing zombification somehow to its endpoint, the mathematical nature of its results permits a rational, rigorous, objective and successful in-depth analysis which may not have been possible on the conscious side, due to the possibly ineffable character of introspection which examinations of the mind for itself would need to involve. But, for it to reach such a worthy destination, any artificial constraint of trying to keep superficial similarities should be replaced by the more natural requirement of confidently and liberally letting the math speak for itself: listening to the most intrinsic and universal necessities of mathematical ontology beyond the limits of any prior familiarity or any arbitrary postulates about how ontology should be working. This is required for the consistency of this picture of parallel substances : the idea that, behind the possibly contingent aspects of its currently familiar form, mind is a primitive substance as much as mathematics is, so that its deeper foundations to be discovered remain mind-like with just purer features of fundamentality, universality and necessity, and can only be compared with similarly deeper, universal and necessary aspects of the foundations of mathematics. So, the deep similarities between mathematical and conscious ontology can only be found by digging into what can be the most fundamental, universal and necessary features of mathematical ontology, without another prejudice or agenda such as contrived research of superficially similar but non-fundamental arbitrary mathematical constructs.

Some works such as Edward Ferrier's "Against the Iterative Conception of Set" may claim to object to the picture of time flow in set theory by projecting preconceptions about how the flow of time should work based on our specific conscious experience of it, to complain that it does not match set theory well. The proper solution of course is to abstain from assuming such a fully pre-packaged conception of time, and paying more attention to how set theory and other foundational theories really and naturally work so as to draw the correct picture of the mathematical time flow (which may not always coincide with the conscious one) out of it.

As we shall see, the scope of possible zombification is wider than often assumed, as some key features from conscious ontology, often presumed to be specific to the mind, actually persist in some form in the mathematical limit, such as subjectivity, (partial) self-reference, and time passage.

In this picture we can notice the rather central position of common language, which happens to be at a crossroad relating all components at stakes: it is a partially zombified expression of conscious understanding, somewhere on the way of formalization since it admits numerical encodings, yet it is not completely formal, but flexible enough in its rules of expression and intended semantics to be able to express rather well all thoughts we seemingly need to express. We shall of course keep using it as the main and only adequate tool to communicate much of the concepts at stakes in the topic of the present work, namely the ontological issues of pure mathematics. This topic may be the main case of our possible conceptual abilities that combine the qualities of being entirely focused on purely mathematical objects, and still not themselves completely zombifiable concepts. As such, unlike any fixed axiomatic theory, it would be absurd to be expect such expressions to be always properly readable in a completely literal manner.

We shall thus sometimes benefit the crucial input of metaphorical modes of expression, whose actually intended meanings can only be brought to life by each reader for oneself, due to the impossibility of complete formalization. This will include referring to our ordinary intuitions of time and other ontological concepts, and yet in doing so, departing from some of the usual but specific features of conscious or physical ontology, which will be completely ignored from our discussion, to directly focus descriptions in conformity to the features of mathematical ontology, which emerge from expert understanding of mathematical logic. Thus, such expressions may sometimes surprise non-expert readers of this article at first sight, but will hopefully look more acceptable after some serious effort to follow this work of popularization of mathematical logic.

Introduction to the strength hierarchy

Mathematics is split into an endless succession of theories, each representing a particular context, i.e. a perspective or range of perspectives, which one considers to work in. A choice of theory is a way of choosing the objects being studied (or more precisely which objects are seen as primitive while others may then be considered indirectly by construction), and their status (the accepted ways to operate on them). In a sense, a choice of theory is a way of trying to stop time from flowing. Of course it would be impossible to stop time, but on the other hand we can never absolutely escape time: any perspective must somehow be that of some specific time, so that whatever is undertaken at a time may be shot and formalized as it is happening; in particular, any attempt to describe the flow of time itself can anyway be seen as restricted to one or another specific limited time interval, while another reflective move over this description of time, results in reaching still further. Several philosophical works on mathematical ontology ran into imaginary difficulties by considering some specific theory as a candidate ultimate one, representing the end of time, and complaining that it cannot account for the meaning of all its own expressions or axioms, i.e. it cannot reflect over itself. Of course no theory can ever fully account for itself, as time cannot really stop.

So, there is no ultimate theory able to describe everything, but the coexistence and succession of multiple perspectives can be tracked by the articulations between diverse theories, which may give different statuses to the same objects or symbols. On the one hand, one may consider arbitrary theories, i.e. specialized in the description of specific systems; on the other hand, there is a main thread of foundational theories, which form the main time line of mathematical ontology: the "weak" ones describe the shorter realities (which come first), while the "strong" ones describe the later, more expanded realities.

The diversity of foundational theories may take different formal aspects: in principle, these theories may differ by their language (lists of primitive types and symbols), their axioms, and their logic (the deduction rules used to derive consequences from axioms). In practice, hardly any role is played by the choice of logic, usually reduced to only one possibility (first-order logic); the possibly suitable languages are quite few (roughly, the weakest theories use the language of first-order arithmetic ; the stronger ones rather use those of second-order arithmetic, higher-order arithmetic or set theory, but some overlap is possible in their range of use). So, the bulk of variations in strength consists in the diverse ranges of accepted axioms, where strength is added by adding more axioms, among formulas which were left undecidable by weaker theories. Metaphorically, strength is a measure of knowledge, which expands in time by the virtue of each successive act of reflection.

We shall review all this in details.

Objects and symbols

Before entering any complex mathematical theory to painfully try to chase there the shadow of the time flow of mathematical ontology, let us start by directly staring into its eyes in its bare form: its way of already ordering the simplest actions, which precede any choice of a specific theory (so to say, actions still weaker than the weakest theory), and thus any use of logic (whose operation is relative to a choice of theory), usually overlooked due to their apparent triviality causing them to fail giving any job to researchers, but which can yet crucially serve to give birth to the core intuition which then provides the thread of understanding of the whole story.

All starts by assuming that there are some objects, and undertaking to introduce symbols to name them. It matters little whether this existence assumption is "really" right or wrong: what matters instead is the action of doing that, which is largely self-sufficient. Indeed, symbols are themselves objects in their own right, so that

an act of introducing symbols is a case of objects creation, letting some objects exist then;
what matters is the formal system made of the symbols used; as we shall explain, these can ultimately behave just like the objects they supposedly meant, thus play their role, and by this possible substitution make the pre-existence of these objects more or less irrelevant.

Moreover, any formal description work once given can then be reflected upon and formally described by introducing additional symbols to name any previously unnamed object. For example, given an object X, the name "X" we gave it, is another object we can name "S". Still, we also implicitly meant to link both, namely we undertook using S to name X, and this link is itself a third object which may be formalized as an ordered pair (S,X), or (to let this be generalized to having multiple symbols naming multiple objects), as the function from {S} to {X}. Now, X and S need to exist first before this link (S,X) of interpretation between them can be built.

This posteriority of the link between objects and symbols, with respect to these objects and symbols themselves, plays a more effective role in the construction of interpretations of mathematical expressions : this proceeds step by step from the constant and variable symbols occuring there, to the operations over them, to the more external sub-expressions of the given expression.

Variables and sets

As symbols are not a priori coming with their semantics attached, this leaves open the question of the possible existence and uniqueness of that semantics. Some given symbols may admit multiple possible interpretations, as a matter of perspective. Namely, a symbol once seen as a constant naming one object, may also be seen as naming another object (if its value is not fully determined by the given axiomatic or other context); therefore, still another perspective consists in recognizing the plurality of its possible values, which means seeing it as a variable rather than a constant. Each such perspective is called a set : this is a particular range of values which are considered available (possible) for a given symbol in a given context, and essentially sums up itself such a context of possibilities for the concerns of some given variable symbols.

But, this raises the question whether this more encompassing perspective is itself unique, or how open to variations it may be. Namely, just as any view of a given symbol as a constant expresses a specific perspective which may often be overstepped by a wider perspective seeing it as a variable, any specific view of a given symbol as a variable ranging over a fixed set may often be overstepped by a wider perspective seeing this set itself as able to vary.

A choice of a theory involves first a choice of the kinds of variable symbols which are admitted in its language, and a concept of the sets of objects which each kind of variables is intended to range in. Two main kinds of theories need to be distinguished:

Ordinary first-order theories each have a short list of kinds of variable symbols (maybe just one), and each such kind serves as the implicit name of a supposedly fixed set, that is the set of values which each variable of that kind supposedly ranges over;
Set theories are theories with some use of sets (intended ranges of some of its variable symbols) which they explicitly regard as not themselves fixed (not meant for each use to refer to a supposedly unique and ultimate interpretation as a set), and thus, to formalize this variation, need to admit the use of other variable symbols to name the sets involved; this other kind of variable is said to range over a class, for distinction with the first kind of variables who ranged over one "set" at a time.

This distinction may seem to disappear from a strictly formal viewpoint.
Indeed, the range of the second kind of variables, that is "the range of all sets which a given set theory may happen to name", can be formalized in the same way as those of the first kind (sets), except only that it would be another sort (say, using another alphabet to name those variables). Then, a logical grammar with variable sets (where some variables may have variable ranges) can be reduced to one without, by letting the role of variables with variable ranges be formally played by variables basically ranging in the union of these ranges, then restricted from there to specific ones by conditions inserted in formulas. By these means, set theories are made to fit the framework of first-order logic.

This reduction to first-order logic (almost) perfectly works in the case of finite set theories, which can also be expressed in the form of theories of bounded arithmetic. Indeed, any finite set, made of some finite number n of objects, may be represented through a given choice of enumeration as the set A_n of all numbers x<n (or some other set of numbers all lower than some other number defined as the cardinal of the union of some finite list of finite sets which are involved by a theory at a given time). So, in any theory considering just finite sets as its sets of basic objects, if the role of a set of n elements is played by the number n (so, if such sets can be suitably represented by some A_n), then the role of the range of all finite sets possibly considered by that theory (or anyway a range containing them all) is played by the fixed set ℕ of all natural numbers (to be used separately from the finite sets, i.e. naming numbers by a different kind of variable than those naming the basic objects, because ℕ is infinite).

(Otherwise we still let ℕ represent the range of finite sets by encoding any finite subset of ℕ by a natural number using the BIT predicate).

But this comes to work somehow imperfectly with theories of infinite sets, as we shall explain now.

Classes of infinite sets

For most set theories meant to describe infinite sets, the above described reduction to first-order logic still works on the syntactic side, but loses track of the relevant semantics.

Indeed, according to the normal semantics of first-order logic, each given theory admits particular interpretations, that may not be an absolutely unique possible interpretation, but yet treated as a fixed one for the sake of the internal works of the whole given theory; and each such interpretation regards each variable symbol of the theory as ranging over a fixed set by the basic use of quantifiers (before being practically restricted to its subsets according to the formulas).

But many set theories are actually meant to be interpreted in the following quite different way. They are meant to admit neither a single nor multiple acceptable interpretations of their "class" of sets as a fixed set, but to admit a kind of interpretation in which that class of sets is not thought of as a fixed set.

Indeed, whenever a range of a variable is a set, such set theories mean to formally admit it as such; and their class of sets will always exceed it, since they recognize that no particular set can ever be the ultimate one containing every possible set.

So, the semantical point of the concept of class, by contrast to that of set, is to name a kind of range of objects of a theory (more precisely: the syntactic feature of a theory having a kind of variables formally referring to a range of possible objects), without giving it any supposedly fixed semantics as a set. It means treating variable symbols as able to take some values and even some ranges of possible values but with no claim to ever reach a point of exhaustion of these possibilities.

Now, classes may admit candidate interpretations as sets, which may form valid interpretations of set theories in the sense of first-order logic. These interpretations can serve as practical views encompassing those values which may be found along, and relevant for, a given range of works. Yet they remain somehow semantically unsatisfactory, as each of them may be later overstepped by some new, "better" ones as broader sets, including further elements yet to be created.

Potential vs actual infinity

As noted by [Linnebo & Shapiro : Potentiality and indeterminacy in mathematics], a parallel can be done between Aristotle's finitism, according to which the infinity of ℕ only exists in a potential sense (of possibility to consider ever larger finite sets), and the potential nature of the universe of sets in modern set theories, which admit the infinity of ℕ as actual.

Now an aspect which may have been overlooked, is the questionable definiteness of the presumed opposition of characters: potential vs actual. This is essentially the opposition of characters, for some given event, of being future vs being past. It is really not an intrinsic property of that event, but a property of the perspective taken over it. It is just a question about what time it is, which cannot admit a fixed answer.

Something appears in potential (i.e. as future) by some successive reflective moves which will end up creating it ; then a reflection over such moves results in conceiving them in their generality, and turns the whole sequence of such moves into a present actuality ; then another reflection over this generality results in formalizing it as a theory, whose semantics effectively amounts to viewing that sequence as past.

Of course, a kind of divergence occurs here between syntax and semantics : the syntax which describes the generality remains a finite one, while its semantics is infinite. To make this infinity actual, requires to effectively proceed the infinite sequence of actions following the described rules, and such endless processes may eventually give some results which could not be predicted from any fixed theory in smaller amounts of operations.

But while the potential infinity of the range of all finite sets can theoretically become actual once for all by actualizing ℕ as a set to mark the end of all possible finite ages, it is not possible to complete the actualization of the potential class of all sets, because there can be no absolute end of all possible times.

The kinds of sets

As a first approach to the content of the strength hierarchy, i.e. the main time line of mathematical ontology, one may roughly sum it up as a succession of 4 main stages, according to the kind of variation ranges (from the "smaller" to the "larger" ones) which are recognized, or better, qualified as "sets", i.e. on which quantifiers (∀ and ∃) are considered meaningful (a condition which may be somehow seen as optional and thus as splitting the latter 3 stages); it is just a matter of reflection (or abstraction) to move from each kind to the next:

The explicitly enumerated sets, such as {a,b,c}, i.e. whose elements are explicitly and individually named whenever such sets are used. A quantifier there, such as (∀x∈{a,b,c}, F(x)), is naturally definite as it is more specifically written F(a)∧F(b)∧F(c). So, we can think of the connectives ∧ and ∨ as the notations respectively given to the quantifiers ∀ and ∃ when applied to explicitly enumerated sets.
The supposedly finite sets, which may serve as abstract, general views over possible explicitly enumerated sets, i.e. viewed as general objects of discourse and without explicit limit of size, and thus no more explicitly enumerated.
The set ℕ, needed in the role of range of all possible finite sets, or sizes of such.
℘(ℕ) (the power set of ℕ, range of all subsets of ℕ), needed to express the axiom of induction in arithmetic (the theory aiming to describe ℕ), of the form ∀P∈℘(ℕ), (...). This does not fully solve the problem of trying to axiomatically define the isomorphism class of ℕ, since expressible axioms in first-order logic can only partially simulate the intended exhaustivity of the power set of an infinite set, not completely oblige it ; but it helps to narrow down the range of logically admissible interpretations of that isomorphism class. It does so better and better following the strength given to the theory by means of axioms.
Seemingly larger ranges, such as ℘(℘(ℕ)), or the class of all sets in a given set theoretical universe, may as well be understood as all contained in ℘(ℕ) as a matter of perspective, according to the (downward) Löwenheim–Skolem theorem we shall explain later.

Hence a succession of theories aimed to express the recognition of these successive kinds of ranges as sets:

Theories of bounded arithmetic only admit quantifiers with finite ranges
Those of first-order arithmetic use ℕ as their strongest range;
Stronger kinds of sets are handled by second-order arithmetics, higher order arithmetics and set theories.

Concepts of truth and open quantifiers

This rough measure of the strength of theories by the kind of ranges which are recognized as sets on which quantifiers can be applied, can be refined as the measure of the class (successively larger kinds) of formulas which are accepted by a theory as meaningful (true or false). Hence the need to clarify the concept of truth. Philosophers may have been very confused on the topic of truth, with a lot of competing thesis. Anyway, outside mathematics things are really a mess, making such debates really hopeless there. Now in mathematics, the right conception of truth is somewhat non-trivial, yet rather necessary, leaving no room either for mystery or hopeless controversy.

Basically, the truth or falsity of a formula is the result of its interpretation in some supposedly fixed system of objects, and thus relative to the choice of that system. This choice may be explicit (expressed by variables in a given theory) and/or implicit (a matter of interpreting a theory as a whole), as a matter of perspective. Implicit choices may be made explicit by reflection (i.e. time passage), yet such explicitation works may remain never fully complete.

Yet, some other kinds of formulas, especially those using open quantifiers, i.e. quantifiers over a class instead of a set, are meant to follow a different concept, despite the lack of syntactic difference. The difference may be explained as a matter of tense : the basic concept above (by an effectively done interpretation), may be qualified as assuming a past tense for its objects, while the alternative concept admits a future tense. It would be dubious to try formalizing this difference, both because it is a mere difference of semantics, which rigorously cannot be contained by syntax anyway, and because, again, the question whether a given object is past or future is a matter of what is the current time with respect to this object, which cannot have a fixed answer.

Unlike past propositions, future propositions are actually not meant to take a regularly clear semantics, because the computation process which would produce their value is not assumed to be available. Rather, such propositions can be understood as either imperative (when chosen as axioms) or predictive (when obtained as theorems). Thus, in most contexts, only some of the possibly expressible future propositions are taken as true or false, while other propositions of the same or even lower complexity may be still seen as undetermined. The progress of strength of theories, that is the time passage of mathematical ontology, may be measured by the increasing range of expressible propositions whose semantics is held as past.

One may expect further details on the concept of truth, more precisely in the past tense case, and ask whether the deflationist conception of truth works. Yet, whatever candidate formal definition of truth (semantics) may be given, it can only provide a semantics insofar as it is itself interpreted, which means that it essentially leaves unchanged the issue of the need to externally provide a semantics in a chosen world of objects outside the syntax; the only question, then, is whether some definitions can provide non-essential but still useful contributions, and indeed there are, as follows.

The "deflationist conception" of truth (also called disquotation), which effectively means standing still (trivially defining truth by itself with no further explanation), is indeed obviously valid in the precise case when the formula to be interpreted is effectively given, fixed and explicitly written down before our eyes.

But in the rest of cases, namely when a formula is presented in some other way, either by its name (a constant or variable symbol) or some expression which begs to be interpreted as a finite mathematical object encoding the formula to be interpreted, deflationism is obviously inapplicable in a strict sense (but can nevertheless be abusively admitted for practical reasons: giving the name or another not fully spelled out designation of a formula to mean its truth). What is needed instead, is a predicate T depending on a variable F ranging over a certain class of formulas, to interpret the truth in that class. The point is to effectively sum up the interpretation process for a whole range of possible formulas F, into that for a single formula T only depending on a parameter F. So, a truth predicate plays the same role with respect to some general range of formulas, as the concept of universal Turing machine plays with respect to the general concept of Turing machines : the role of emulating hardware by software.

Now a possible view of time (strength) consists in the successively larger and larger classes of formulas which are considered definite, and for which the required definition of the truth predicate is more and more "complex" in some sense.

Truth undefinability

A natural question, then, is to compare the complexity of a truth formula Tr with that of the formulas it aimed to interpret, namely, to specify whether, for any (specific, spelled out) F in the class of formulas interpretable by Tr, the formula Tr(F) also belongs to that class.

An obvious requirement for the formula Tr is that it must be able to handle a parameter designating a formula from a certain range, and there is a minimum strength needed for this. But already with this strength (or so ?) the liar paradox is applicable to tell more.
Indeed, from the simplest ability of a theory to handle formulas, one can derive the ability for a formula to essentially quote itself (this quotation included): for any formula F with one variable in a range of formulas, one can write a term Q provably equal (in value) to the quote of F(Q)
Or almost: more honestly, what can be done is, rather simply, write a formula G such that, provably in the weakest theory, G(X) is equivalent to X being the quote of (∃Y, G(Y)∧F(Y)).

Then, if we could take as F the negation of Tr (denoted ¬T), the value of Q would be the formula ¬Tr(Q), that is a formula claiming its own falsity.
Therefore, either T fails to describe truth in its intended range, or it is a category mistake, i.e. the liar paradox was trying to offer to the variable X of Tr(X) a value outside the class of formulas which T could handle. Conclusion : T cannot interpret formulas of the form "¬Tr(something)".

A generalized version of the truth undefinability theorem, proven in the same way, says this:

For any class C of formulas F(x) with one free variable x in some range X,
For any formula Φ(F,x,y) with 3 free variables F,x,y, where F ranges in C, and both x, y range in X,
If there exists y in X such that (∀x∈X, F(x) ⇔ Φ(F,x,y)) for every explicit F in C,
then the class C cannot contain the formula ¬Φ(F,x,x) with one free variable x, while F is replaced in Φ by any arbitrary explicit choice (which presumably does not affect the class of Φ).

(precisely, if C contains ¬Φ(F,x,x) then one can build from Φ a formula F in C for which one can prove (∀y∈X, F(y) ⇔ ¬Φ(F,y,y)) )

As we shall see, most classes of formulas in the strength hierarchy turn out to be each interpretable by a truth predicate which does belong to the same class (the main exceptions are the weakest classes and the Delta classes). Necessarily therefore, these classes are unstable by negation : the negations of formulas from one such class cannot all also belong to that class.

Hierarchy of first-order formulas

Let us now describe and classify the formulas which can be written in any first-order theory with unspecified kinds of variables, thus applicable to the description of any specific system, before applying this to the foundational theories in the main line of strengths with the particular kinds of sets listed above. For convenience to handle the general case of an unspecified theory, we shall name the domains of variables (which are fixed and specific to given theories) using set theoretical notations, and abusively use more set theoretical tools meant to abbreviate some possible works with first-order theories. To not have to cope with exceptions, all considered sets are assumed to be non-empty.

Let us describe the structure of arbitrary formulas by listing the successive kinds of subexpressions they are typically made of, in the computation order of their interpretation: from the innermost subexpressions (which take values first), to the outermost (ending the computation):

Terms are expressions whose only symbols are those of variables, constants, functions and operations, and are the only expressions meant designate an object rather than a Boolean (true/false). Usual formalizations of arithmetic are based on the constant symbols 0 and 1 and operation symbols + and x but others can be used.
Atomic formulas are made of just one relation symbol applied to a list of terms (usually two). The usual relation symbols are = (in any theory), < (for arithmetic), and ∈ (for set theory), and possibly more depending on theories, or added for convenience.
Other quantifier-free formulas combine atomic formulas by means of logical connectives.
Formulas in prenex form written as a chain of quantifiers applied to a quantifier-free formula. Such formulas with a chain of n alternated quantifiers starting with ∃ (resp. starting with ∀) and alternating between ∀ and ∃, are said to be of class Σ_n (resp. of class Π_n). For example, the Π₂ formulas are of the form (∀x, ∃y, F(x,y)) where F(x,y) abbreviates a quantifier-free formula using x and y as variables.

Indeed, the rest of formulas which may be written, are provably equivalent to formulas of class Σ_n or Π_n as just described, by simple reduction rules described below. We may extend the definitions of classes Σ_n or Π_n to include in each class all formulas provably equivalent to some formula which fits the said format, though the resulting added complexity of such definitions may raise classification issues for the weakest theories trying to describe them.
A formula is said to be of class ∆_n if it is both equivalent to a formula of class Σ_n and to another formula of class Π_n. Depending on context, one may either need this equivalence to be provable, or simply true for all values of a given variable in the current interpretation.
These classes are related as follows: for all n,

∆_n = Σ_n ∩ Π_n
F∈Σ_n ⇔ (¬F)∈Π_n
Σ_n ∪ Π_n ⊂ ∆_n+1

Indeed each class Σ_n includes all classes Σ_k and Π_k for k<n, and the same for Π_n, because one can always re-write a formula with useless additional quantifiers.

Let us explain the reduction of other formulas into the said classes:

Consecutive quantifiers of the same kind can be reduced into one: (∀x∈X, ∀y∈Y, F(x,y)) becomes what we may conveniently denote (∀(x,y)∈X×Y, F(x,y)) to mean more formally ∀z∈X ×Y, F(π(z),π'(z)) where π and π' are meant as the projection functions from the ordered pair z=(x,y) to its components.
Occurrences of connectives outside quantifiers can be moved inside, as follows. First note that for any two formulas of the form (∀x∈X, F(x)) and (∀y∈Y, G(y)) it is usually possible, in the cases we shall consider, to treat X as a subset of Y or vice versa, and thus re-write the first formula as (∀x∈Y, x∈X ⇒ F(x)), thus formally giving the same domain to the ∀ of both formulas. Then, each class Σ_n and Π_n is stable by both ∧ and ∨ because
((∀x∈X, F(x)) ∧ (∀y∈X, G(y))) ⇔ (∀x∈X, F(x) ∧ G(x))
((∃x∈X, F(x)) ∧ (∃y∈Y, G(y))) ⇔ (∃(x,y)∈X×Y, F(x) ∧ G(y))
and the remaining two cases follow by negation; also follow by negation some results on the use of ⇒ as (F⇒G) can be written (¬F∨G).
Finally, the best we can say on the use of the connective ⇔ is the stability of each ∆_n deducible from the above by some reduction of ⇔ to the use of ¬, ∧ and ∨.

While it implicitly follows from the above, we can explicitly see that if F, G are ∆_n+1 then
(∀x∈X, F(x)) ∧ (∃y∈Y, G(y)) is ∆_n+2 as it is equivalent to both

∀x∈X, ∃y∈Y, F(x) ∧ G(y)
∃y∈Y, ∀x∈X, F(x) ∧ G(y)

Second-order formulas

In addition to quantifiers over some sets called base sets, we may also need to use quantifiers whose range is thought of as that of "all possible structures" of a given kind over some base sets. Possible kinds of structures are the n-ary relations and the n-ary operations, for some small explicit n:

Unary relations over a set X are identifiable with subsets of X; their range may denoted ℘(X) or 2^X ;
Binary relations between sets X, Y are subsets of, X×Y; their range may be denoted ℘(X×Y) or 2^X×Y ;
Functions (unary operations) from a set X to a set Y; their range is denoted Y^X.;
Binary operations from two sets X,Y to a third set Z, their range is Z^X×Y .

and so on.
Let us consider formulas admitting two kinds of quantifiers, according to their range:

A quantifier (and its variable) will be said to be first-order if its range is either a base set or a set obtained from these by finite, explicit operations of union, binary sum and binary products;
It is called second-order if its range is a set of structures as defined above; or more generally if it is expressible by such operations of union, binary sum and binary products, which may apply over both base sets and sets of structures, and at least one set of structures is involved.

A formula is said to be first-order if it only involves first-order quantifiers; classes of first-order formulas are denoted Σ⁰_n or Π⁰_n or ∆⁰_n following the above definition, with an upper index 0 to distinguish them from second-order formulas which can also be considered, and while free second-order variables may be used.
A second-order formula is a formula with both first-order and second-order quantifiers; once written in the canonical form described above (starting with a sequence of quantifiers), it is said to be of class Σ¹_n or Π¹_n according to its second-order quantifiers only: the kind of the first second-order quantifier in the sequence, and the number of alternations these, while ignoring all first-order quantifiers.

A fundamental fact about second-order formulas is the following.
If the axiom of choice (AC) over base sets is accepted, then any Σ¹₁ formula is reducible to a formula of the form ∃A,∀x, F(A,x) where A is second-order, x is first-order, and F is a formula without quantifier.

Indeed this reduction proceeds by successive uses of the equivalence (expression of AC over Y):

(∀y∈Y,∃u∈U, F(y,u)) ⇔ (∃B∈U^Y,∀y∈Y, F(y,B(y)))

thus simplifying any consecutive first-order ∀ on the right:
(∀y∈Y,∃u∈U,∀x∈X, F(y,u,x)) ⇔ (∃B∈U^Y,∀(y,x)∈Y×X, F(y,B(y),x))
and similarly any consecutive second-order ∃ on the left.

Note that if Y is first-order and U is second-order then U^Y is still second-order.
Another possible reduced form is ∃A,∀x,∃y, F(A,x,y) where A ranges in a set of relations; without AC, the previous reduced form can be reduced into this one, but not vice versa.

However, formulas of the form ∃A,∀x, F(A,x) where A ranges in a set of relations, x ranges in a base set (not a product of them) and F has no quantifier and no function or operation symbol, are actually of class Π⁰₁. Intermediate cases between those just mentioned may be hard to classify, and will be ignored here.

Formulas of bounded arithmetic

When describing finite systems (and similarly later also infinite ones), quantifiers usually range over finite sets, and are naturally translated to quantifiers of bounded arithmetic.
For example to describe a system based on a set A with a elements and a set B with b elements, the quantifier (∀x∈A, F(x)) is translated as (∀x<a, F(x)), and the same for B.
(This is itself part of first-order arithmetic, where (∀x<a, F(x)) is seen as an abbreviation of (∀x, x<a ⇒ F(x))).
More sets constructed from these can be used through relevant expressions:

A×B can be expressed as the set of z<a⋅b, with the projection functions defined as respectively the quotient and the remainder of the division of z by b.
Subsets of A can be formalized as numbers X<2^a with membership i∈X expressed as BIT(X,i) (the i-th bit of X in binary expression);
Functions from A to B can be formalized as numbers X<b^a using the notation of numbers in base b.

This can be formalized in bounded arithmetic with the only basic operations 0,1,+ and ⋅, and a few more uses of bounded quantifiers, which we shall not detail here.

If the sizes of base sets are much larger than those of formulas which are used, and yet somewhat comparable to each other (of the style: any used first-order bounds a,b satisfy a<bⁿ and b<aⁿ for some n much smaller than a and b), then all second-order ranges are larger than all first-order one; this justifies the relevance of that distinction between first-order and second-order quantifiers in bounded arithmetic.

Then, the lessons from previous sections can be phrased as follows.

The strength of formulas of bounded arithmetic, depend on the values of bounds, the number of quantifiers and the number of alternations of them.
The use of consecutive quantifiers of the same kind amounts to multiplying the values of bounds involved, which is a "small" increase of strength.
The use of alternated quantifiers (i.e. consecutive quantifiers of opposite kinds) provides, roughly speaking, a bigger increase of strength (even if the former may not be effectively reducible to it).
Yet, a still bigger strengthening comes by exponential increases of the values of bounds (such as replacing a by b^a); namely, the effect of using an additional alternated first-order quantifier with bound a in a formula, namely ∀x<a, ∃y<c, F(x,y) where F starts with a universal quantifier, is covered by the exponential jump of value for the bound of the next quantifier from c to c^a, plus a change of F marked by the mere multiplication by a of the bound of its first quantifier.

The hierarchy of formulas of bounded arithmetic, induces a hierarchy of very weak systems of bounded arithmetic, according to the class of formulas accepted as input to the induction axiom.
The lower and upper extremes of the hierarchy of such theories when the respectively accepted classes of formulas range from the class of first-order formulas to the class of n-th order ones for any explicit n (where the bound of an n-th order quantifier is expressed by an exponential tower with height n where all n entries are first-order) are :

I∆₀ (which only accepts multiplication as its most powerful basic operation)
Elementary Function Arithmetic (usually called EA or EFA), which admits the existence of the exponential function (and thus its full use).

Such subsystems of arithmetic have been studied in particular in [Samuel R. Buss, Handbook of proof theory, ch. II].

Truth predicates of bounded arithmetic

Let us now sketch the expression, in bounded arithmetic, of a core truth predicate for formulas of bounded arithmetic, assumed to be in prenex form (the described reductions of some other formulas into this form may also be formalized with added complications).

Namely, the core difficulty is to handle the diversity of quantifier-free formulas, by addressing its two aspects:

The diversity of atomic formulas, which is reducible to that of terms by distinction of cases for a theory with a small list of basic relation symbols;
The diverse uses of logical quantifiers over a finite but not explicitly limited list of atomic formulas

In answer to 1. the truth of an atomic formula F (here meant as abbreviating the data of its syntactic structure, formalizable by means of function symbols...) with size |F|, can be written

∃i<m^|F|, (∀x<|F|, C(F,x,m,i)) ∧ R(F,i)

where

m is a known upper bound for the values of all sub-terms (if no operation symbol greater than multiplication is used then m=(max of values of variables)^|F| fits);
C and R are quantifier-free;
(∀x<|F|, C(F,x,m,i)) means that i correctly interprets the terms involved (provably, there is a unique i satisfying it);
R(F,i) gives the resulting value of the given relation symbol.

By symmetry of the problem (the same method is applicable to the negation of that atomic formula), this truth formula more precisely belongs to a Delta class; the provably equivalent symmetric expression is

∀i<m^|F|, (∀x<|F|,C(F,x,m,i)) ⇒ R(F,i)

The exact qualification of that Delta class (either ∆⁰₁ or ∆¹₁) depends on whether |F| is supposedly "much smaller" than m, or comparable to it.

In answer to 2. we first need to reduce the use of connectives to that of the mere ∧ and ∨ by progressively moving all ¬ inwards down to the level of atomic sub-formulas, which can thus either use a primitively admitted relation symbol, or its negation.
Then, such a propositional formula, composed of just ∧ and ∨, can be seen as a finite game of going through the formula from its root (outermost connective) to one leaf (an atomic formula), where

the choice on each ∧ is up to player A and the choice on each ∨ is up to the player B,
the truth of the formula can be defined by the existence of a winning strategy for B trying to reach a true leaf,
and thus by symmetry, its falsity means the existence of a winning strategy for A trying to reach a false leaf.

The same truth formula given by the concept of winning strategy for B, can be explained in other words : in terms of repeatedly using distributivity to move all ∨ outwards (to the root) and all ∧ inwards (to the leaves). Roughly speaking, the truth of a propositional formula F with size |F| can be so written ∃s<2^|F|, ∀y<|F|, ...

Putting both stages of the definition of truth one into the other, both "dominating" ∃ may be treated as consecutive (as the ∀ in between is of lower order), letting the overall class the same as it was for atomic formulas (if atomic formulas are indeed allowed to have sizes comparable to that of the propositional formula relating them).

Depending on hypothesis about the bounds (in terms of magnitudes, not exact values) on sizes for considered formulas compared to the values of free variables, the truth formula for all such formulas from any given class Σ⁰_n may either belong to another interpretation of Σ⁰_n (with somewhat higher precise bounds than those first fixed), or to ∆¹₁.

The remaining aspect of the problem of expressing truth for bounded arithmetical formulas, aside the consideration of formulas not expressed in prenex form, is to allow variations for the number of quantifiers, namely covering, for example, any formulas of class Σ⁰_n where n may vary beyond explicit limits. The solution to this is again expressible as a winning strategy for a finite game, with the difference that the number of possible choices at each step is some designated number to be interpreted, instead of only the number 2 or another explicit enumeration of elements. Again also, the symmetry of the problem leads to a provably equivalent expression of the opposite kind, which leads to have ∆¹₁ in the role of class gathering all formulas (and the truth formula for these) of bounded arithmetic with class any Σ⁰_n , Π¹_n or ∆¹_n, and whose size (and thus also n) is somehow "much smaller" than the values of bounds involved.

Formulas of arithmetic of any order

The above classification of formulas of bounded arithmetic, can be naturally extended to formulas with quantifiers over higher kinds of sets as introduced earlier.

A first (most coarse-grained) classification of formulas is given by the data of the highest kind of set used by quantifiers. Beyond finite sets, the usually considered sets are the iterated power sets ℘ⁱ(ℕ) where ℘⁰(ℕ) = ℕ and ℘ⁱ⁺¹(ℕ)=℘(℘ⁱ(ℕ)). A quantifier with range ℘ⁱ(ℕ) is said to be of order i+1. The order of a formula is defined as the highest order of its quantifiers.

A theory is respectively called a first-order arithmetic, a second-order arithmetic,... an (i+1)-th order arithmetic (collectively named higher-order arithmetic), according to the maximal order of the formulas admitted in its syntax.

Then, a formula of higher-order arithmetic in prenex form is said to be precisely of class Σⁱ_n (resp. Πⁱ_n) if it is of order i+1, and its chain of quantifiers contains precisely n alternated quantifiers of order i+1 starting with ∃ (resp. ∀). For example a formula of the form ∃x∈ℕ, ∀y<x, ∃z∈ℕ F(x,y,z) is of class Σ⁰₁ only because both first-order quantifiers are of the same sign (not separated by an opposite one of the same order) and thus counted as only one.

A more fully reduced form for formulas of precise classes Σⁱ_n or Πⁱ_n, which constitute the more strict definitions for these classes, is to have their chain of quantifiers starting with n alternated quantifiers of order i+1, followed by quantifiers of lower order only (which is not the case of the example above); then also successively reducing the sub-formula of lower order quantifiers, we get a formula whose chain of quantifiers is not only alternating in signs, but also decreasing in order.

Hence the question whether such a reduction is possible. Answer : this depends on the precise choice of axioms given to the considered theory. As we shall review later, the above mentioned reductions of formulas, simplifying their classification, will be more often working than not in the context of respectively relevant theories, according to whether the equivalence between a formula and its reduced version is deducible from the given axioms.
At least, whenever formulas of a given class in first-order or second-order arithmetic are considered, their use usually presumes a context strong enough to allow reducing formulas of lower class. In other words, sub-formulas of an arithmetical formula after the first unbounded numerical quantifier, or those of a second-order arithmetical formula after the first set quantifier, can be assumed to be reduced.

In first-order arithmetic, the involved classes of formulas will be more simply denoted Σ_n, Π_n and ∆_n according to the number n of alternated unbounded numerical quantifiers, while ignoring the presence of bounded quantifiers.

List of reduction rules and their working conditions.

First, each ℘ⁱ(ℕ)×℘ⁱ(ℕ) is directly reducible to ℘ⁱ(ℕ) by an explicit bijection (only using multiplication for i=0, and using even weaker means for other i; this can be contrasted with the need of AC to prove the existence of a bijection between X×X and X for unspecified infinite sets X). Moreover, there is a natural injection from ℘ⁱ(ℕ) to ℘^j(ℕ) for i<j, and also from finite ranges to ℕ, so that any consecutive quantifiers of the same sign can be summed up to just one with the highest order from these.

This gives a formula whose chain of quantifiers has alternated signs only. The remaining step is to essentially re-order quantifiers in the chain, to let them appear by decreasing strength. This re-ordering proceeds by a succession of steps, each of which consists in essentially switching the places of two consecutive quantifiers, moving a quantifier right in exchange for one of higher order (while of course changing the content of the sub-formula), and therefore also reducing the length of the chain on each side by packing 2 consecutive quantifiers of the same sign into one.

Starting with a given formula with a disordered chain of quantifiers, in order to move all quantifiers of the highest order (in that formula) to the left of all those of lower order, a preferable step by step strategy is to first focus on the right : take the innermost non-reduced sub-formula of the same order, reduce it, then repeat the process by adding into consideration one by one the next quantifiers just left of it. This way, each reduction step consists in moving a lower order quantifier right through an already reduced higher order sub-formula.

The diversity of possible moves, according to the orders of both quantifiers which are switched and the class of the sub-formula right of these, means also a diversity of minimal strengths needed for a theory to justify each of them (this concept of strength of theories will be explained later).

The weakest of all moves belongs to bounded arithmetic: the switch of two bounded quantifiers, possibly reducing the number of bounded quantifiers at the end. It is a form of finite choice: from any formula F we can build a formula G such that, provably,

(∀x<a, ∃y<b, F(x,y)) ⇔ (∃z∈ℕ, ∀x<a, G(x,z))

where G(x,z) says (z<b^a ∧ F(x, the x-th digit of z in base b)).

There are only two limits to these:

The theory must admit the existence of the exponential function, i.e. be no weaker than EA, for this to work in all cases (otherwise it may just work in some cases to be specified).
The encoding of the needed information into the variable z, requires G to contain some decoding tools, whose definition from accepted basic operations may require some additional bounded quantifiers, hence a remaining need of more than one bounded quantifier before the quantifier-free part of a formula.

As we shall consider theories no weaker than EA (except otherwise stated), the number of bounded quantifiers at the end of chains of quantifiers is reducible to a fixed small number.

Once written a truth predicate for a certain class of bounded formulas by a method sketched above (while each such formula escapes the exact class it meant to cover), we get a Π₁ truth formula for the class of Π₁ arithmetical formulas in reduced form, and therefore similarly (by adding quantifiers left of it) a truth formula for each class Σⁱ_n or Πⁱ_n of formulas in reduced forms (for any i and any n>0), where each such truth formula belongs itself to the same class (with only an additional free variable designating the formula to be interpreted).
This allows to reduce any axiom schema over a Σⁱ_n or Πⁱ_n class, to a single axiom obtained by applying the schema over a truth formula : this implies every other instance by giving it the relevant value of its parameter.

The next kind of moves belongs to first-order arithmetic: the reduction of

∀x<a, ∃y∈ℕ, F(x,y)

where F is an arithmetical Π_n formula for some n. It depends on the arithmetical collection axioms

(∀x<a, ∃y∈ℕ, F(x,y)) ⇔ (∃b∈ℕ, ∀x<a, ∃y<b, F(x,y))

denoted BΓ for each considered class Γ which F ranges over. [Samuel R. Buss, Handbook of proof theory, ch. II] [A proof-theoretic analysis of collection, Lev D. Beklemishev]. They can also be expressed as versions of finite choice.
Collection axioms may be extended to classes Γ beyond first-order arithmetic, but then the reduction problem faces other needs of moves which require stronger axioms.

According to [Σ_n-Bounding and ∆_n-Induction, Theodore A. Slaman], BΣ_n is equivalent under EA to ∆_n-Induction (I∆_n), i.e. induction on any Σ_n formula F equivalent to some Π_n formula G, with no provability requirement:
For any Σ_n formula F and any Π_n formula G,
∀parameters, (F(0)∧ ∀x, F(x) ⇔ G(x) ⇒ F(x+1)) ⇒ ∀x, F(x)
also equivalent to the scheme:
∀parameters, ∀y, (F(0)∧ ∀x<y, F(x) ⇔ G(x) ⇒ F(x+1)) ⇒ F(y)

while BΠ_n is equivalent to BΣ_n+1 and thus to ∆_n+1-Induction. Summing things up, collection axioms are related to induction axioms as follows.
∀n∈ℕ, IΣ_n+1 ⇒ BΣ_n+1 ⇔ BΠ_n ⇔ I∆_n+1 ⇒ IΣ_n ⇔ IΠ_n

Reduction moves in second-order arithmetic start with uses of the axioms of finite choice (which are provable by induction):
(∀x<a, ∃Y∈℘(ℕ), F(x,Y)) ⇔ (∃Z∈℘(ℕ×ℕ), ∀x<a, F(x,Z|x))
where by definition, y ∈ Z|x ⇔ (x,y)∈ Z.

Then come some versions of the axiom of countable choice, of the form
(∀x∈ℕ, ∃Y∈℘(ℕ), F(x,Y)) ⇔ (∃Z∈℘(ℕ×ℕ), ∀x∈ℕ, F(x,Z|x))

(so named to mean the use of AC where the domain of ∀ is countable, that is, has a bijection with ℕ)
One may think of a candidate weaker solution for reduction, that would be another kind of collection axioms
(∀x∈ℕ, ∃Y∈℘(ℕ), F(x,Y)) ⇔ (∃Z∈℘(ℕ×ℕ), ∀x∈ℕ, ∃z∈ℕ, F(x,Z|z))
but it can be simplified into the countable choice version, by composition with the function which from each x gives the smallest z that fits. This method can only form a genuine weakening in stronger cases where z becomes a variable of second order or more (thus Y must be at least of third order for this to have any interest), so that the idea of picking the smallest z no more works.

Actually the distinction between ℘(ℕ) and ℕ^ℕ, which we neglected here, may matter in some ways. The issue is that the simulation of ℕ by the set of singletons in ℘(ℕ), and thus the simulation of ℕ^ℕ by a subset of ℘(ℕ×ℕ) and thus of ℘(ℕ), requires some use of numerical quantifiers. For this reason, two optimally reduced forms for Σ¹₁ formulas need to be distinguished:

∃X∈℘(ℕ), ∀y∈ℕ, ∃z∈ℕ, F(X,y,z)
∃X∈ℕ^ℕ, ∀y∈ℕ, F'(X,y)

where F,F' have bounded quantifiers only. Some syntactic adaptations are needed, replacing the use of ℘(ℕ) by that of ℕ^ℕ, in particular in the expression of countable choice (turning it like our first mention of AC), in order to fit this second form of reduction.

On the other hand, the Weak König's Lemma allows to reduce formulas of the form ∃X∈℘(ℕ), ∀y∈ℕ, F(X,y), (where F only has bounded quantifiers and only uses X in the form of the relation (...∈X)), into Π⁰₁ formulas.

Hierarchies of sets

The above described hierarchy of formulas induces a hierarchy of definable sets in each ℘^k(ℕ) for k>0.

Namely, for a given k>0, a set X∈℘^k(ℕ) is said to be of class Σⁱ_n (resp. Πⁱ_n) if it is definable as X={x∈℘^k-1(ℕ)|F(x)} for some formula F of class Σⁱ_n (resp. Πⁱ_n).
Since truth in a class of formulas is expressible by a formula of the same class, the set of all true formulas from a class, also belongs to the class so named, as a subset of the set of all formulas of that class (identifiable to ℕ by some encoding)

The sets of class Σ⁰_n or Π⁰_n for some n∈ℕ are called arithmetical; those of class Σ¹_n or Π¹_n for some n∈ℕ are called analytical.

Numerical parameters (i.e. with value in ℕ) in F bring nothing as they can be eliminated (defined themselves by terms).
A set parameter with value in ℘(ℕ) already has deep impact on the classification of sets definable from it.

For k=1, a set parameter would trivialize the classification of definable sets (except for the classification relative to a fixed value of the set parameter used in the definition), so it needs to be banned, thus giving in ℘(ℕ) a hierarchy of classes of definable sets, where each class is countable (as there are countably many formulas).

For k=2 we need to distinguish between the lightface hierarchy (Σ¹_n and Π¹_n) whose defining formulas have no parameter, so that each such class is also countable; and the boldface hierarchy (Σ¹_n and Π¹_n) of sets defined by a formula of respective class but using a set parameter in ℘(ℕ), so that each such class of sets has the cardinality of the continuum (i.e. it is in bijection with ℘(ℕ)).

A set is said to be of class ∆ⁱ_n if it is equivalently definable by some formula of class Σⁱ_n and some other formula of class Πⁱ_n. This equivalence is not required to be provable; it just need to hold for all x∈℘^k-1(ℕ). The provability of such equivalences depends not only on the formulas, but also on the strength (proving power) of the foundational theory that is used, as we shall discuss later.

The class ∆¹₁ of sets for k=1 as well as the lightface one of k=2, is the union of the classes Σ⁰_α for all recursive ordinals α (which are all the ordinals for which Σ⁰_α makes sense); sets in this class are called the hyperarithmetical sets. A recursive ordinal can be conceived as the isomorphism class of a recursively defined well-order of a subset of ℕ; it is therefore countable.
The boldface ∆¹₁, on the other hand, is the union of the classes Σ⁰_α for all countable ordinals α; it is called the class of Borel sets.

Interdependence of set complexities

This hierarchy of formulas, where each named class may have different precise definitions depending on syntax details, and whose equivalence is not always ensured, is only an introductory step towards the actual description of the strength hierarchy, which is even more complex.

Indeed the succession of named infinite power sets ℘ⁱ(ℕ) for each i>0, is only a syntactic succession representing successively stronger sets, each of which seems fixed from the viewpoint of a given theory with a supposedly fixed semantics; but this cannot, in itself, fix the successive strengths of these sets in the absolute, by lack of means to formally require them to contain sets beyond those definable by the accepted means of the given theory.

In details, the axioms specifying that some ℘^k(ℕ) contains some given classes (Σⁱ_n or Πⁱ_n) of definable subsets of ℘^k-1(ℕ) where k≤i, can induce the existence of more sets than those so specified, because starting with empty content and completing it by additionally defined sets, can disturb the interpretation of some definitions. This will lead the same defining formulas to generate more sets, and thus leave orphan their previous offspring still involved in driving other defining formulas to their respective values. Hence the difficulty to assess the precise strength of such axioms, to which we shall come back later.

(Yet there exist universes of set theory where every set is definable, according to
POINTWISE DEFINABLE MODELS OF SET THEORY - JOEL DAVID HAMKINS, DAVID LINETSKY, AND JONAS REITZ
but that remarkable circumstance cannot be itself expressed in the same language of set theory, that would be needed for any attempt to axiomatically claim or deny it)

Provability and the strength hierarchy

Of course, whatever fundamentally escapes expression by syntax cannot be studied either, but an effective syntactic exploration is actually possible here, in the form of the hierarchy of possible lists of axioms (or, of possible sets of provable formulas, deducible from axioms) which an arithmetic of any order can be given. Intuitively speaking, the strength of a theory consists in the lower bound on the strength of the semantics that remains compatible with its axioms; and for example, a second-order arithmetic with strong axioms can be stronger than a hundredth order arithmetic with weak axioms.

(For simplicity we shall often designate a theory by just specifying its strongest axiom(s), i.e. those which newly come precisely at its strength level, keeping implicit the other axioms also needed but already present in weaker theories along the main line of foundational theories.)

Let us now explain why the concept of provability of formulas must be subject to an endless hierarchy of more and more encompassing versions, which actually forms the core measure of strength for theories.

First, observe that any provability concept must be expressible by a Σ⁰₁ formula (∃p∈ℕ, p encodes a proof of F).
Second, some basic quality requirements for a provability concept to fit its job (soundness and the recognition of clearly available proofs) imply that its restriction to the class of Σ⁰₁ formulas also works as a truth predicate there. However as previously explained, the truth on the opposite class of Π⁰₁ formulas cannot have a Σ⁰₁ expression, and thus cannot be exhausted by any fixed provability concept. So, as long as a theory is sound (provability implies truth), it cannot be complete on any class of formulas beyond Σ⁰₁, such as Π⁰₁ : (it must contain true but unprovable formulas). Moreover, unlike for higher classes, any proof of a false Π⁰₁ formula leads to contradiction with the straightforward proof of its negation.

Hence the concept of strength hierarchy of theories, which may have several "most often equivalent" definitions, but the main ones we shall consider are

Inclusion of the sets of Π⁰₁ theorems: T' is stronger than T if all Π⁰₁ formulas provable by T' are also provable by T;
Inclusion of the whole sets of theorems : same without restriction to a particular class of formulas.

The first one may be considered the clearest, as it forms a total order between all "natural" theories, i.e. which belong to the main thread of foundational theories (while of course other theories can stay out of that total order), while the second one somehow mixes different criteria, with more risk for "natural" theories to appear incomparable in its sense.

Practically any theorem of bounded arithmetic is actually of class Π⁰₁ (no less) because, to apply beyond explicit written bounds, all bounded quantifiers involved need bounds given by free variables which implicitly have to be themselves under ∀ in ℕ.

The Löwenheim–Skolem theorem

We shall now give another tool clarifying the fact that, despite Cantor's theorem making infinite sets involved in some higher-order arithmetic formally appear as sets of different cardinalities, there is another philosophical perspective by which they are all countable sets, only assigned different roles, so that the different strengths they seem to get in a given theory actually come from the different structures by which they are used.

To describe an infinite system, we just have a little problem: we cannot really access it to interpret our formulas there, because our operational tools are finite. Never mind, we just need to call a god who could observe one, and undertake a conversation with him about it.
The conversation proceeds as follows (here are the general rules, inside which more specific assumptions or strategies can be followed depending on purpose).

At any time of the conversation, you have a partial map of the system, made of:

a list of names: both the constant symbols you started with, and free variables you progressively introduce; meanwhile, the god supposedly has in mind a fixed interpretation of your names as specific objects in his system.
a list of formulas you know as true about them.

The first version of the list of formulas can be thought of as an axioms list of some theory; depending on context, this may either be an explicit list of formulas known true for theoretical reasons, or a theoretically conceived list of revealed truths from the god.

Along the conversation, your map expands step by step. The kinds of possible steps are the following:

You can add yourself a formula by some logical deduction from those you have. At least:

SUB: From any universal truth (i.e. a formula starting with ∀) we denote here (∀x, F(x)), and a name y that fits the intended range of x, you get a particular truth F(y);
Any known conjunction F∧G can be split into two distinctly known truths F and G (and conversely).

Some other truths you cannot always deduce, but the god will tell you. At least he will tell:

The values of all basic relation symbols applied to your names (so for each you get either it or its negation). In particular, you are warned whenever you got synonyms, so that you can simplify your map to keep its interpretation injective.
DIS: If you have a disjunction (F∨G), it will get clarified by either confirming the truth of F or that of G. This completes the treatment of connectives if formulas are written with the only connectives ∧ and ∨, and negation only inside with atomic sub-formulas; otherwise we just need to add similar rules to handle the rest of connectives.

You can add a new name with a truth about it, in two ways:

DEF: each possible term t can be taken to define a new name x, and thus add it together with its defining formula x=t (with the term here named t explicitly written down) as truth; the god will follow, giving x the value calculated from t;
EX: Given an existential truth (∃x, F(x)) you can add a name for x, and assume F(x); the god will react choosing for it a value that fits.

For this to work the whole way to its infinite limit, there is a condition on the framework: it requires the axiom DC of Dependent Choice (which is a weak version of AC):

(∀n∈ℕ, ∀x∈E, ∃y∈E, R(n,X,Y)) ⇒ (∀a∈E, ∃Z∈E^ℕ, Z(0)=a ∧ ∀n∈ℕ, R(n,Z(n),Z(n+1))

This is provably equivalent to the simpler formula where the use of the variable a, and the presence of n as a first argument of R, are removed; we keep them here for convenience.
The case when E is countable is provable without AC, by choosing each time the smallest example according to a fixed bijection with ℕ.

Let us be very precise on the ontological conditions this construction relies on.

It looks like an infinite two players game with choices at each step. Let us put aside the choice whether more or less truths are revealed beyond those mentioned above, as this degree of freedom will be completely closed by a specification in each context of use.

The only remaining choice you seem to have is on the order of cases you will pick for adding new names by the rules DEF and EX, where the list of available formulas for EX is especially likely to get endlessly expanded by applying SUB on a formula of class Π_n where n>1, with the growing list of names.
Yet this freedom of yours can be ignored, since it no way affects the final result : presuming any arbitrary strategy from your part, the replies by the god can be used as replies to any other strategy, since all possible strategies of yours can only be asking the same questions in a different order, allowing the god to give the same values to names given by the same introduction rules between your possible strategies, and thus also give the same answers on truths. (Actually, this conformity of answers is required to keep you on the same track of possible questions). At their respective limits, all such maps, being given the same interpretation, are of course isomorphic, as they are just differently labelling by numbers the same countable subset of the territory.

The only choices which really affect the result, thus, are the god's choices, namely those of suitable values during the EX steps (and also during DIS, but this role of DIS can be eliminated by using formulas in prenex form only).
For these to proceed, the conditions are the following.

First, the god needs to have actually interpreted all given formulas in his target system sufficiently to verify their truth, in the sense of effectively knowing an example of object that witnesses the truth of each existential quantifier contained there whenever needed for any given values of variables. This involves seeing the domains of all variables as sets, at least somehow.
Then, the acts of choosing one example among the possibly many available ones, need not have been ready in advance for every possible case : doing a choice for every possible values of all variables (beyond the values actually picked) would be a stronger kind of use of AC if the domains of variables are not known to be countable.
What it takes, then, expressible as an instance of DC, is just a sequence of choices, where each choice is for a case (a particular existential formula with particular values of names) which depends on the previous choices. And, it takes the time to make these choices successively rather than simultaneously (while possibilities of countable amounts of independent choices are instances of the axiom of Countable Choice, which are weaker for given domains).

Actually DC is an equivalent condition, since it comes back as the particular case of a conversation starting with the formula (∀n∈ℕ, ∀x∈E, ∃y∈E, R(n,X,Y)) and a few more stuff to proceed.

The magic of a conversation with these rules is that, if endlessly continued, the resulting potential content of your map is identifiable with a subset of the territory, with the same features as those expected of the whole : all the features and truths promised by the axioms for the territory, are getting progressively fulfilled by your map itself.

This can still theoretically proceed assuming that all truths about the territory, expressible by any given range of formulas, were revealed (provided we could find a god to interpret these formulas there), in which case none of such formulas can distinguish the map from the territory : both will have all the same true formulas beyond any explicitly definable list of axioms.

The relativity of infinite cardinals

Yet, the map is necessarily countable (the names can be labelled by natural numbers according to their introduction time), regardless whether the territory is countable or not. So, for any theory which happens to be true somewhere, none of its axioms can ever block the existence of a valid countable interpretation of the same theory. In particular, any theory with the power to apply Lowenheim-Skolem to a given class of formulas involving the set 2^ℕ, will deduce the existence of a countable system fulfilling the same truths from that class. It is easy to see that the interpretation of ℕ is preserved, while 2^ℕ gets re-interpreted as a countable subset P⊂2^ℕ.

This forms a paradox with Cantor's theorem for theories of second-order arithmetic and any other theory supposedly admitting ℘(ℕ) as an available domain of quantification, which appears to present ℘(ℕ) as necessarily uncountable, so that P≠2^ℕ.
Intuitively speaking, this is because the intended claim P=2^ℕ would be second-order with respect to the sets P and ℕ (which are among those formally involved as ranges of variables of a given theory, and seen as first-order ranges when interpreted by first-order logic), and thus cannot be among the expressed formulas whose conformity of truth was ensured; while the countability of P is also a second-order concept in this sense.

To see things more precisely, let us write down a formal expression of Cantor's theorem : for any set E,

∀f∈(2^E)^E, ∃A∈2^E, ∀x∈E, f(x)≠A

here applied to the case E=ℕ; remember that (2^ℕ)^ℕ has an expressible bijection with 2^ℕ, through which the quantifier ∀f∈(2^ℕ)^ℕ may be re-expressed as one with domain 2^ℕ.
Then, Lowenheim-Skolem also maps (2^ℕ)^ℕ by a countable subset P'⊂ (2^ℕ)^ℕ, linked to P by the natural bijection between 2^ℕ and (2^ℕ)^ℕ (there are different natural bijections but they all give the same determination of P' from P thanks to some basic axioms).

As P is countable, there exists f∈P^ℕ that is bijective (so ∀A∈P, ∃x∈ℕ, f(x)=A).
By Cantor's theorem in the countable interpretation, we get f∉P'.

Hence the possibility of a philosophical view over the same foundational theories of mathematics, defended by
Naive Infinitism: The Case for an Inconsistency Approach to Infinite Collections, Toby Meadows
Notre Dame Journal of Formal Logic Volume 56, Number 1, 2015
https://sites.google.com/site/tobymeadows/papers-talks

and made of the following claims (here re-expressed, not Meadow's words):

All possible infinite sets are actually countable.
Ideally conceived, ℘(ℕ) (and thus generally the power set of any infinite set) is not a set but a class.
Theories formally involving power sets of infinite sets, are only meant to get non-ideal, particular interpretations, namely each particular god will interpret the power set ℘(E) of each infinite set E, as the mere set of all subsets E which he happened to find during his particular thread of exploration.
The axioms of a given theory that seem to make ℘(E) contain all subsets of E for each E, are only expressing stability properties of the given interpretations of those power sets, along such threads of exploration endowed with those exploration tools expressed by those axioms.
In particular, the infinite sets described by a theory as uncountable, only seem to be so because each considered interpretation is missing their bijections with ℕ, whose existence only appears from a wider perspective.

For example, with a car you can explore a continent but not cross an ocean to each another continent; with a plane you can reach another continent but not another planet.
Between a god G viewing math in that way, and another H claiming to have the absolutely complete power sets of infinite sets, the dispute may be endless, as long as they only have finite expression means at any time to exchange their arguments, in the style of the omnipotence paradox:

H : "I got the complete, uncountable interpretation of ℘(ℕ), that transcends any possible countable sequence of subsets of ℕ !"
G : "I can beat any interpretation of ℘(ℕ) you may have, by finding a bijection of it with ℕ that will make it countable !"

(and similarly for the power sets of other infinite sets)
Yet there appears an asymmetry between them: the view of H is for most purposes quite stronger than that of G. The Lowenheim-Skolem theorem was a rather easy way down from H to G, but the inverse way, from G to H, involves a bottom-up reconstruction in potentialist terms, that needs further explanations.
The idea is that, in lack of a real H, the view G can anyway be expanded up to a simulation of H. The main issue is the following

Potential stability thesis. Any thread of sets exploration by given tools, will remain contained by some horizons of stability relative to these tools, in the sense that such a horizon gives to each considered "power set" an appearance of completeness that resists these tools. Precisely speaking, any reasonable version of set theory admits interpretations as universes of sets, satisfying all its axioms describing diverse ways for designated "power sets" to behave as complete.

We shall address later the crucial point of justifying important cases of this thesis.
Now let us just explain the last step, from this thesis to the simulation of H by G. The issue is to idealize this concept of stability horizon, from cases of those with only specific strengths of stability able to resist the strength of given sets exploration tools, to those with ideally absolute strength that will resist any conceivable sets exploration tool.
Let us mention different ways to answer this problem.
One is to notice that there is no upper limit to strength of stability of possible horizons, either metaphysically nor from the axioms of any good set theory. So, pointing out that a given set theory admits a universe whose stability simply fits the strength of its axioms, does not hinder the possibility to interpret the same theory by horizons with any higher stability that may be desired later. Now, once any work was done in any given theory, it is always possible to switch to any stronger theory and reinterpret the previous work as in fact more precisely meant to refer to some of the newly requested kinds of universes, stronger than the conditions which were explicitly formalized before.

Another way is to consider that successively stronger exploration tools which will be considered, whatever they will be, will form a sequence that can be itself taken as a tool stronger than them all, to which the potential stability thesis can be applied; a horizon of stability for that one can then play the role of H.

But, this raises some questions.
One question is whether such a simulated H, thus argued to exist, also enjoys a kind of essential uniqueness. The answer turns out negative as this non-uniqueness causes the undecidability of the continuum hypothesis, which resists large cardinal axioms, and thus also of endlessly many further independent statements which belong to the same and higher classes. This comes in contrast with the fact that truth in lower classes of statements was found to be ultimately fixed in some sense.

Another question is whether this succession is supposed to be recursive. Indeed the point of the strength hierarchy as explained above was that each specific theory formalizes a specific recursive provability predicate, and we need a non-recursive succession of them to approach a non-recursive truth predicate. This difference may be dismissed as futile as it only distinguishes between possible limits of infinite progressions of successively stronger theories to be considered, which is hardly anyone's concerns.
A further question is whether the potential stability thesis is still applicable to some hypothetical, divinely revealed non-recursive sequence of stronger and stronger tools escaping any sound and recursive provability predicate, such as the full data of the truth predicate on a certain class of formulas. And actually, we will have some reasons to think it is.

The completeness theorem

Yet, the above also raises the general question whether sense can be made of a theory made of an infinity of axioms, of which we could only ensure the meaningfulness of finite sub-lists at a time. The answer is yes, but with a cop-out we shall explain later.

Actually, the same strategy to generate interpretations, works to solve two problems at a time

The completeness theorem interprets any consistent theory (consistent = without contradiction).
The ultraproduct is a limit interpretation that follows the track of some endless range of possible interpretations, which may each get eventually eliminated but just not all together (more precisely we get here a variant of the concept of ultraproduct ; more work is needed to construct the full ultraproduct).

The latter is somehow a particular case of the former: any contradiction in a theory can only use a finite list of its axioms, so that such a finite list is already inconsistent; therefore the interpretability of each finite list of axioms implies the consistency of the theory.

The construction of the completeness theorem follows the same rules of conversation we described for the Lowenheim-Skolem theorem, with only the following difference : the god is not witnessing a system he knew in advance, but only playfully pretending to do so. His answers are not caring to conform to any witnessed truth, but only to preserve logical consistency. Considering the conversation as a two players game, your questions would amount to a thorough search for contradictions out of the given formulas, and the end result after an infinity of steps is a matter of the god having followed a winning strategy for his goal to never be caught in contradiction.
The case of the ultraproduct follows the same description, replacing "logical consistency" by "compatibility with an infinity of the given interpretations"

Beyond some special cases of non-foundational theories, very weak foundational theories or low-class formulas to start with, the god still needs a divine knowledge to succeed.
Indeed, a consistent theory will either stay consistent with a formula F, or with ¬F, or both, but there is no general computable method to find out which case we are in (if no contradiction from either F or its negation could be found after some given limited time of search, no matter how long it was, then the search needs to be given up inconclusively, as the risk of not having searched long enough to find one that exists can never be ruled out). Thus, there is no general computable method either to choose adding either F or ¬F and be sure of not introducing an inconsistency, that is, no computable method can escape the risk of making a wrong choice that would end up in contradiction after a possibly very long but inescapable time.
The existence of a god's winning strategy out of any consistent theory, is equivalent to the Weak König's Lemma.

The fundamental conclusion of this scenario is that, regardless that the god may not have felt aware of an existing interpretation of the given theory at the beginning of the conversation, his strategical ability to preserve consistency leads him to end up building one, thus giving reality to what started as a lie or fancy.

Three ontological lessons may be drawn from this (and some aspects of these lessons come from the Lowenheim-Skolem theorem alone).

The local lesson is that the existence of a certain mathematical system, is inseparable from (so to speak: locally coincidental with) the activity of formally describing such a system, either way:

For a divine witness, the choice to observe and describe a given system, is what defines the scope given to the existential quantifiers which are used, and thus also of what is locally (subjectively) receiving an existential status;
For a finite inquirer, the activity of exploring a thread of consistent extensions (or logical consequences) from a given consistent theory, is effectively generating, as the potential infinity limit determined by its development rules, an instance (or a set of possible instances) of a mathematical system described by that theory.

The global lesson is a fundamental connection between truth, ontology and provability : whatever is consistent, is true in some existing reality (and thus, for any unprovable formula, there exists a counter-example universe, where it is false).

The lesson somewhere in between, is, in reply to those philosophers who might have fancied there a matter of dispute, the lack of genuine ontological difference behind the diversity of foundational theories for mathematics which do not recognize the same kinds of primitive objects, provided that they are mutually interpretable.

Namely, some foundational theories are set theories where everything is a set; other set theories also include urelements; it may be convenient to also admit functions as primitive objects; other theories admit natural numbers as primitive, in which case sets may either be assumed to contain natural numbers only (forming a version of second-order arithmetic) or allowed to contain other sets or something else.
But, as long as objects, formulas and proofs can be systematically translated from one of these theories to another, they are essentially synonymous from an ontological viewpoint: they are just different perspectives on the same mathematical reality, and coexist just "next to each other".

Now a special point to add to this picture, is that consistency or provability statements may be themselves undecidable... which means that these, and thus also their equivalent ontological statements, may be one way in some universes, and the other way in others.

Non-standard models of arithmetic

As earlier explained, any consistent theory leaves some true Π⁰₁ formulas unproven. By completeness, such a theory admits an interpretation where its negation, a Σ⁰₁ formula (∃x∈ℕ, F(x)) where F is bounded arithmetical, is seen as true despite being actually false.
So, if you wonder if (∃x∈ℕ, F(x)) holds, and try to investigate this by logical deductions from your favorite theory which actually leaves it undecidable, no matter how long you search, you will not find the answer, but you still never know if it is indeed undecidable, or you just did not search long enough.
The only other chance to get out in case that first search was indeed vain, is to call a god for help.
If you find a god telling that (∃x∈ℕ, F(x)) is false, then you can trust him on that indeed, because you know he could foresee and carefully escape the risk to be later caught in contradiction by the discovery of an x fitting F(x).
But if you only found gods claiming it true, then you still never know. You perhaps just didn't stumble on the right gods.

Let us expand on the behavior of a god G claiming to see an x∈ℕ such that F(x), when no such x actually exists. For any number n which a limited being Y might explicitly write down, G will keep telling that x≠n, and more precisely n<x:
Y: How much bigger is your x than, say, a gogolplex (10^10¹⁰⁰)?
G: Haha. My x is still very much bigger that that.
Y: What with a power tower of a million tens ? The Graham number ?
...and so on, taking on the challenge of offering still more elaborate definitions of very large numbers in the role of n, but the reaction will be the same. Y may keep trying further in the form of more and more complex definitions for larger and larger numbers with still reasonable size of definition, but for doing so, would need stronger and stronger theories to remain able to prove their meaningfulness in reasonable time (to find proofs with sizes unlike the size of the meant number; the strength involved to do so is rather the Π⁰₂ strength, not of the Π⁰₁ one).
Since none of this suffices:
Y : Please answer this challenge yourself then : give me a description of an upper bound for x !
G: I could, but you may run out of the necessary patience to follow, as it takes so long to explain.
Y : Are you sure ? Please find any way possible to make it.
G: All right here you are. The square of the smallest z such that F(z), is indeed an upper bound for x.
Y: Uh ? But I do not know that such a number exists !
G: You don't ? I do !

Now a god who both understands G's view and knows its falsity, explains it as follows. G interprets the set ℕ as a set ℕ_G containing not only all the true natural numbers, called the standard numbers (elements of the true meaning of ℕ), but other elements, called non-standard numbers, and seen as greater than all standard numbers. The set of all standard numbers as a subset of ℕ_G violates the axiom of induction in ℕ_G. The mistake of G, comes from his failure to fully test the induction axiom in his ℕ_G, due to his inability to discover all its existing subsets. The recognition of this error is a matter of knowing (or being revealed) some subsets A of ℕ_G, yet unknown by G, which are counter-examples to the induction axiom (and more precisely some also counter-examples to (∃x∈A, F(x)), to come back to our point of dispute).
Then the intersection of all such subsets is also one, forming the true ℕ. But, in lack of an ultimate interpretation of the range of all subsets of the infinite set ℕ_G, the means are lacking to so define the true ℕ. We may only refer to more or less strongly stable interpretations of this power set, whose respective resulting intersections are more or less strong (realistic) interpretations of ℕ.

Arithmetic and ontology

All this needs to start from some existing interpretation ℕ_G of arithmetic. This departure point has to be assumed out of nowhere, as a kind of fundamental relativity (subjectivity) of ontology. There is nothing to metaphysically favor a particular interpretation of arithmetic as the first existing one (the only special one is the standard one, which is included in all others, but it is also the strongest of all). We may rather think of ontology as initially undecided with respect the range of possible versions (interpretations) of mathematical reality, before being progressively refined (focused) along time.

The completeness showed a path from the consistency statement of a given theory (which is an arithmetical statement), to the existence of interpretations. In the light of the above (the relativity of truth about consistency), let us now comment on the inverse path, from ontology to arithmetical truth.

We may distinguish two links that make up this path: one between a model of a theory and a model of arithmetic implicitly needed to contain its formalism; the other between models of arithmetic and arithmetical truths.

For the first link, we need to distinguish, for any system M described by a theory T, between M in itself and the verification process of the fact that M is indeed a model of T. Indeed M is usually kind of simple in its content, in the sense that it is just made of set(s) and structures that interpret the symbols of the theory. But to interpret T there and check the validity of the match, an infinite process may be needed. The first issue is for the case T supposedly has an infinity of axioms: a model of arithmetic is needed to interpret the intended content of the axioms list. Then a model MA of arithmetic is anyway needed to interpret the range of all possible expressions of T, beyond axioms. This choice of MA is constrained (tested) by the need to interpret all expressions of T in M. Non-standard expressions (i.e. non-standard numbers while numbers play the role of codes of expressions) are not generally ensured to be uniquely interpretable in any system; they only appear to be given a natural unique interpretation in M by a god (a reality system) who knows M while using MA as his interpretation of arithmetic, mistaking it as standard.

But, when an M from outside his reality, comes to break into it (revealed by a greater god), it may happen that the connection fails, as the recursion process involved in interpreting in M successive "expressions" with larger sizes or higher classes which might be non-standard, suddenly fails, revealing the non-standardness of MA.

In case the interpretation process still goes through along the whole MA, two other kinds of mismatch may occur:

Some non-standard axioms (elements of the interpreted definition of the axioms list) turn out false, though all standard ones were true; so, M existed for the god but was denied to fit the axioms, and this still cannot lead him to change view.
The non-standardness of MA may get revealed by some non-standard proof which reaches a false conclusion. This involves having a common truth predicate applicable to all formulas used by the proof, as the rules of logic ensure the premise of the induction axiom (rather, a variant expressing well-foundedness) on the uses of the truth predicate over the successive formulas which form the proof. Indeed, the natural proof that the existence of a model M implies consistency of its description, goes by induction on the truth of formulas (interpreted in M) which an inconsistency would be made of.

So, successive revelations of existence of mathematical systems M, are linked to the revelations of subsets of ℕ which may uncover the non-standardness of previously acceptable-looking interpretations of ℕ; this narrowing of the range of compatible models of arithmetic is linked to the revelations of truth of previously undecided Π⁰₁ formulas, especially consistency statements for theories describing M.

The well-order of expressible ontology

We are now ready to introduce the core concept of ordinal analysis: the fact the main thread of foundational theories for mathematics is well-ordered, thus identifiable with an ordinal, namely the first nonrecursive ordinal, called the Church–Kleene ordinal and denoted by ω₁^CK; each particular foundational theory in the main thread gets labelled by a recursive ordinal.
That is actually not very accurate, as there are a few cop-outs to this general fact. Let us gather a few hints towards it.

It can be shown that if a binary relation is altogether extensional, transitive and well-founded, then it is also total, and thus a strict well-order.
If a binary relation is just transitive and well-founded but not extensional, then it still has a quite natural morphism (quotient) into a strict well-order, which is no more injective.

As we shall explain, the main time line of mathematical ontology is both transitive and well-founded, which leads to label each foundational theory by a unique ordinal, but the same ordinal may qualify multiple theories; depending on precise definitions (especially when restricting consideration to the class Π⁰₁ of formulas), one may distinguish such a thread which is also extensional, thus keeping just one theory for each ordinal.

Due to the recursive nature of the provability powers of theories, used to compare their strengths, the strength of theories (in the main thread of natural foundational theories) is more precisely measured by recursive ordinals. This measure is the topic of ordinal analysis. An overview of some core issues of ordinal analysis is given for example in [On the hierarchy of natural theories, James Walsh, arXiv:2106.05794v1]. Some more details are developed here.

The time order of mathematical ontology is the order of relative existence: a world M precedes a world M' if M exists inside M', in the sense that it is possible to designate and fully interpret M inside M'; this relation is made irreflexive by the truth undefinability theorem.

So conceived, its transitivity is obvious: if a system with given properties appears to exist in some existing world then it simply exists with those properties (all axioms with standard size which describe it, got correctly interpreted and satisfied in that world).
Switching from semantics to syntax via the completeness theorem, translates this time order (relative existence between worlds) to a relation of provable consistency : a theory T precedes a theory T' if T' can prove the consistency of T (so that every model of T' recognizes the existence of some model of T). his transitivity takes the form of the fact all Σ⁰₁ truths are provable, to be used in contrapositive way on consistency statements which are Π⁰₁: if a consistency statement is consistent then it is true.

The well-foundedness is a bit less clear, already since it is a second-order property, whose formalizations cannot exclude non-standard interpretations, which leaves possibilities of some kinds of exceptions to that rule.
There are multiple versions of this well-foundedness, pertaining to either syntax or semantics.
A first version, which occurs somehow in between syntax and semantics, is expressed by the famous Second Incompleteness Theorem, as follows.

Take any theory T expressing arithmetic and presented by its (Σ⁰₁) provability predicate.
Using the above transitivity, the self-reflecting Π⁰₁ formula saying "this formula is unprovable from T", is found provably equivalent to the consistency of T. Thus, the consistency of T is provably equivalent to its own unprovability from T.
By the syntax/semantics equivalence, this can be re-expressed as : "If there are any worlds fitting T then among these there is one not itself containing any such world". This sentence is the very formula of well-foundedness, except only that its second-order quantifier is restricted, from the power set of the range of all worlds, to the only classes of worlds defined each as the class of those worlds fitting some given expressible theory.

Notes:

The nonsense from Russell's paradox when referring to "the range of all worlds" as a set, can be eliminated by referring to "the range of all countable worlds" only, which essentially preserves the scope thanks to the Lowenheim-Skolem theorem.
The restriction of its considered power set to the only classes of worlds defined as fitting some theories, makes indeed a difference, which leaves the room for some exceptions: some descending sequences, which may be qualified as mathematical monsters, such as theorem 3.8 of [Reflection ranks and ordinal analysis, Fedor Pakhomov & James Walsh, arXiv:1805.02095v2]. We shall ignore these from now on.
Of course, a "first" model of T by this order (i.e. one not containing any other) has to be non-standard, as it mistakes the consistent theory T as inconsistent.

As for extensionality, it may have diverse exceptions depending on the precise ranges of considered theories and the comparison criteria.

Π⁰₁-ordinals of theories

Let us introduce an expression of ordinal analysis, which may be the clearest introductory approach to the strength hierarchy of theories, keeping track of the link with ontology via the completeness theorem.
We shall start with the concept of Π⁰₁-ordinal of a theory, which only depends on its provability predicate over Π⁰₁ formulas. It is the most fine-grained all ordinal measures for the strength of theories, thus best for distinguishing the weakest ones.

It is the first in a sequence of ordinal strength measures which are more and more coarse-grained, though they measure different data: for each n>0 the Π⁰_n-ordinal qualifies provability predicates over Π⁰_n formulas.
As these qualify provability predicates of theories over arithmetical formulas only, we can proceed their review in terms of the mere language of first-order arithmetic, and simplify the notation from Π⁰_n to Π_n.

This sequence of Π_n-ordinals is most relevant to measure the strength of theories in between EA and the theory PA of "Peano Arithmetic", that is first-order arithmetic axiomatized by the scheme of induction (IΠ_n for every n). Indeed:

Convenient studies need to presume the resources of EA and thus take it as the weakest considered theory;
For "regular" theories (in a sense we shall explain later) with strengths starting with that of PA, all Π_n-ordinals coincide, so that this is a single ordinal measure instead of a sequence of them, and it admits other equivalent definitions.

These ordinal measures have been studied in diverse works such as
[Proof-theoretic analysis by iterated reflection, Lev D. Beklemishev]
[Reflection calculus and conservativity spectra, Lev D. Beklemishev, arXiv:1703.09314v2]
which are unfortunately not easy to follow. Below is an attempt to popularize some key ideas from these works.

Except otherwise stated, all considered theories T will be recursive, i.e. given by a Σ₁-definable set of axioms. The above reference more precisely requests definability by a bounded formula of EA. I wonder where this may matter ; the general case of a recursive set of axioms is effectively needed to describe the arithmetical consequences of any well-described set theory while ignoring the precise nature of the non-arithmetical axioms from which they are deduced. Anyway, recursive theories lead to recursive sets of theorems (provability predicates): for any formula F, the claim "T proves F" has class Σ₁, as an abbreviation of a formula essentially saying "∃p, p encodes a proof of F from T".

A reflection principle for T is an expression of soundness of T (a claim that all T-provable formulas are true) over a class Γ of formulas (that we shall take to be either Π_n or Σ_n for some n). There are two versions of this, which may differ:

The local Γ-reflection principle, denoted Rfn_Γ(T) takes the form of a schema of one axiom per formula F in Γ :

(T ⊢ F) ⇒ F

where ⊢ means "proves".
The uniform Γ-reflection principle, denoted RFN_Γ(T), is expressible as a single axiom, using the truth predicate over Γ:

∀F∈Γ, (T ⊢ F) ⇒ (F is true)

or equivalently, as the schema of one axiom per formula F of class Γ with one numerical variable x:

∀x, (T ⊢ F(x)) ⇒ F(x).

Note that ∀n>0, RFN_{Π_n}(T) ∈ Π_n.

For any theory T₀ extending EA, and any theory T extending T₀ by a set of Π₁-axioms, the following theories are equivalent (where Σ₀ is the class of bounded formulas):

T + "T is consistent"
T₀ + "T is consistent"
T₀ + RFN_Π₁(T)
T₀ + Rfn_Σ₀(T)

The implications 1.⇒2. and 3.⇒4. are trivial.
2.⇒1. the truth of any Π₁-axiom of T is deducible from its consistency, by the fact its falsity would make T inconsistent with the data of a counter-example.
2.⇒3. by formalizing the argument "A proof from T of a false Π₁ formula would lead to an inconsistency of T in the face of a counter-example".
4.⇒2. is obvious (with F="false").

Since T cannot prove its own consistency, the theory T' given by the above presentations has a wider proving power, and is thus "the next theory" after T, expanding it by just another (or a more powerful) Π₁ axiom. This step from a theory to the next stronger one can be iterated transfinitely along any ordinal which is just definable enough to proceed, namely keeping consistency statements expressible by Π₁ formulas, and these are the recursive ordinals (if I don't mistake). This transfinite sequence of theories, starting from a theory T₀, is said to be the result of iterated reflection from T₀ along the given order.

A funny detail is that, in the case of elementary ordinals (given by an elementary well-order, i.e. defined by a bounded formula in the language of EA), this recursive sequence of theories also admits an elementary expression. In a sense this is trivial, at least for non-limit cases, since they are expressible in the form 2. or 3. made of T₀ with a single additional axiom. But it has been shown to hold in all cases, with steps presented in the form 1., as there is a subtle way to express a solution to that recursive condition by an elementary formula, by means of self-quotation (as mentioned with truth undefinability); then the essential uniqueness of this solution is deduced, not from the well-foundedness of the iteration order, but from a corollary of the second incompleteness theorem. Indeed, well-foundedness cannot be used because it is not generally expressible in EA anyway; and it is not a usual case of transfinite recursion where limits are given extensively by some infinite stuff, but a special case where these limits need to be translated into a finitely expressed definition, to allow for a formulation of their consistency.

A consequence of this funny detail is that such recursions can proceed with their existence and essential uniqueness along non-well-founded orders as well. But, ill-founded iterated reflection is usually quite boring (with possible exceptions, such as where that ill-foundedness is somehow unprovable ?), as it results in the (essentially unique) inconsistent theory.

In [Proof-theoretic analysis by iterated reflection, Lev D. Beklemishev], the Π₁-ordinal |T|_Π₁ of a theory T is defined as the ordinal α such that the α-th iterated reflection from T₀ = EA gives the same Π₁ theorems as T. So defined, |EA|_Π₁ = 0 and |EA + Con(EA)|_Π₁ = 1. However, this convention suffers some arbitrariness by its way of assuming the choice T₀ = EA; different choices for T₀ result in different transfinite sequences of provability predicates over Π₁, and thus possibly different measures for the Π₁-ordinal of a given theory.
The study of this dependence, obtained in his works, suggests what a more objective Π₁-ordinal labelling of theories would look like. But to explain it, we need to also introduce the other Π_n-ordinals and explain how they are related.

Π_n ordinals

The above mentioned transfinite sequences of theories by iterated reflection also works with other classes of formulas.
Namely, for the same reason as above, for any n and any elementary presented theory T containing EA, there is equivalence in EA between RFN_{Σ_n}(T), RFN_{Π_n+1}(T), and the consistency of the extension of T by the whole set of Π_n truths (as interpreted by a Π_n truth formula).

These equivalent formulas are of class Π_n+1; we shall write them in modal notation as ◇_n(T).
They form the basic step whose transfinite iteration from an initial theory T defines a transfinite sequence of theories differing by their Π_n+1 axioms, and thus to be labelled by different values of a certain ordinal measure of their Π_n+1 strength.
Then the study of these sequences of theories and their dependence on T suggests the following kind of conventions as suitable to express their properties in simple form.

We shall attribute Π_n+1 ordinal strengths (n-strength) not only to theories, but also to individual Π_n+1 formulas (both definitions should not be confused); but not to all theories, resp. to all Π_n+1 formulas, but only to the special case of those (theories or formulas) which are regular in some sense, i.e. they belong to the main thread of successive strengths (though there are different degrees of strictness by which this may be conceived). Let us denote S_n⊂Π_n+1 the set of those Π_n+1 formulas which have a definite ordinal strength.

Let us define notations as follows. For any regular theory T, any number n and any recursive ordinal α,

◇_n(T) ∈ S_n ⊂ Π_n+1
T_n^α := T+{◇_n^β(T) | β<α} (regular theory obtained by adding to T the schema of axioms: ◇_n^β(T) for each β<α)
◇_n^α(T) :⇔ ◇_n(T_n^α) ∈ S_n
thus
T_n⁰ = T,
T_n^α+1 ⇔ T+◇_n^α(T), and
◇_n^α(T) ⇔ (◇_n(T) ∧ ∀β<α, ◇_n^β+1(T)).
In particular, ◇_n⁰(T) ⇔ ◇_n(T).
When α is a limit ordinal, T_n^α contains ◇_n^β(T) for each β<α but does not prove ◇_n^α(T) that is (∀β<α, ◇_n^β(T)). This means it admits interpretations where ◇_n^β(T) fails for some non-standard value of β<α greater than all standard ones.

|T|_n = ||◇_n(T)||_n is called the Π_n+1 ordinal strength of T, or more briefly its n-strength.

{||F||_n | F∈S_n ∧ T proves F}⊂ |T|_n (in the sense of von Neumann ordinals) with equality if n>0.
As further commented below, the question whether S₀ and ||F||₀ can be defined to provide non-limit values able to fill the set |T|₀, is out of scope of this exposition.
These ordinal measures have the following two properties

|T|_n₊₁≤|T|_n as, in particular, we obviously have ◇_n₊₁(T) ⇒ ◇_n(T).
For each n, |T|_n is always a multiple of ω^|T|^_n^₊₁ (where ω is the notation for ℕ in the role of ordinal), i.e. there exists an ordinal µ such that |T|_n = ω^|T|^_n^₊₁.µ

Together they imply : |T|_n₊₁≠0 ⇒ µ≠0 ⇒ |T|_n ≥ ω^|T|^_n^₊₁.
Reflection steps affect strength measures in the minimal way allowed by this rule, namely
∀F ∈ S_n, |T+F|_n = max(|T|_n, ||F||_n +ω^|T|^_n^₊₁)
as ||F||_n +ω^|T|^_n^₊₁ is the next multiple of ω^|T|^_n^₊₁ after ||F||_n.
Also, |T_n₊₁¹|_n = |T|_n +ω^|^T_n₊₁¹|^_n^₊₁.
In particular |T_n^α+1|_n = |T_n^α|_n + ω^|T|^_n^₊₁ which by induction gives

|T_n^α|_n = |T|_n+ ω^|T|^_n^₊₁.α

This is related to Schmerl’s formula, which in our notations is written:
If |T|_n₊₂=0 then for all ordinals α,

◇_n^{ω^α}(T) ⇔ ◇_n((T_n¹)_n₊₁^α)

which simplifies if α>0 as

◇_n^ω^{^α}(T) ⇔ ◇_n(T_n₊₁^α)

These facts result from the following basic cause of dependency between reflection formulas in successive classes:

Remember that the strongest Π_n+1 axiom of T_n^α+1 is ◇_n^α(T).
By the formulation of ◇_n₊₁ in terms of consistency, ◇_n₊₁ is insensitive (in terms of truth) to additional true Π_n+1 axioms, so that for all Π_n+1 formulas F, we have the provable equivalence

(◇_n₊₁(T)∧F) ⇔ ◇_n₊₁(T+F)

In particular,

(◇_n₊₁(T)∧◇_n^α(T)) ⇔ ◇_n₊₁(T+◇_n^α(T)) ⇒ ◇_n^α+1(T).

Re-applying this deduction with α replaced by α+1 and so on, leads the theory T_n^α+1+◇_n₊₁(T) to successively prove ◇_n^α+k(T) for all (standard) finite values of k, i.e. to imply (each axiom of) the theory T_n^α+^ω.

From this basic fact (and the fact it is the only cause of dependency between values, which we shall simply admit here), the formula (copied here from above)

|T_n^α|_n = |T|_n+ ω^|T|_n₊₁.α

(and the naturalness of conventions under which this holds) will be deduced by transfinite induction on the value β of |T|_n₊₁ as follows.

While a presence of Π_n+2 axioms, giving a nonzero β, can cause gaps in the progression |T_n^α|_n, the lack of these, expressed by β=0, gives |T+◇_n(T)|_n= |T|_n+1, and thus by induction the case β=0 of the above formula, namely |T_n^α|_n = |T|_n+ α.

To process induction on β, we need to let β take any value (not only multiples of something), by the following consideration.
The link between |T|_n and |T|_n₊₁ remains unchanged when replacing T by a theory with the same values of |T|_n and |T|_n₊₁ but the possibly different value |T|_n₊₂=0, obtained by taking the set of all Π_n+2 theorems of T in guise of set of axioms (this other theory still contains EA if T did, as further commented below). This way, denoting T'= T+◇_n₊₁(T) we get |T'|_n₊₁= β+1.

By the above, ◇_n(T'+◇_n^α(T)) implies ◇_n(T_n^α+^ω); while (we admit here) the converse holds when |T|_n₊₂=0. Thus (provably),

◇_n(T'+◇_n^α(T)) ⇔ ◇_n^α+^ω(T).

From there, by induction on α where the case α=0 is due to (◇_n₊₁(T) ⇒ ◇_n(T)),

◇_n^α(T') ⇔ ◇_n^ω.(1+α)(T)

Hence |T'_n^α|_n = |T_n^ω.(^1+α)|_n = |T|_n+ ω^β.ω.(1+α). = |T'|_n+ ω^β+1.α
which completes the idea of the proof by induction on β.

These formulas leave the values of strength measures |T|₀ and |T|₁ under-determined, only determined by arbitrary conventional choices of values given to the weakest considered theory. Taking EA as the weakest theory (needed for things to work without burden), it naturally has |EA|_n=0 for every n>1, and |EA|₀= ω^|EA|^₁, but |EA|₁ rather deserves a finite but nonzero conventional value (a choice which affects the convention as |EA₁^α|₁ = |EA|₁ +α, and thus only affects measures of theories T with finite values of |T|₁). The question of the most natural value to assign to |EA|₁ may be a matter of dispute or arbitrary convention; the choice appearing in Wikipedia "ordinal analysis" is |EA|₁ =3.

This may look strange given that the most convenient formalization of EA has Π₁ axioms only. The trick is that Π₂ axioms are equivalently expressible in the form of Π₁ axioms over additional symbols of operations or functions, and the strength of EA comes from the strongest operation symbol in its language, which is exponentiation. There is a complicated possibility to re-express it in the mere theory I∆₀ (bouded arithmetic with addition and multiplication), but no less, by a bounded formula Exp(x,y,z) which is true precisely if z = x^y, so that its existence is expressed by the Π₂ axiom (∀x,∀y,∃z, Exp(x,y,z)).

Then the formula ◇₁(EA) is expressible as the existence of the tetration operation (or some other function or operation with equivalent speed, so to speak, such as the superexponential 2_x^y = 2^...^{2^y}), that is the tower of exponents ^xa where ⁰a = 1 and ∀x, ^x+1a = a^{^xa}.
Indeed, RFN_Π₂(EA) gives:
∀x,(EA proves "∀a, ^xa exists") ⇒ (∀a, ^xa exists)
while EA indeed proves "∀a, ^xa exists" by successively using x times the existence of an exponent.
The fast-growing hierarchy is an ordinal-indexed sequence of functions, whose respective totality formulas (counted from that of the tetration) match the formulas ◇₁^α(EA).

A more ordinary way to prove the existence of the tetration, is to deduce it from the existence of exponentiation, using an induction axiom. Precisely, it involves Σ₁-induction (the use of induction on Σ₁ formulas). But this axiom is actually equivalent to ◇₂(EA).
Generally for every n>0, IΣ_n ⇔ IΠ_n ⇔ ◇_n₊₁(EA).

Thus, the Σ_n-induction axiom IΣ_n, is of class Π_n+2, irreducible to any formula of lower class. One may be tempted to see it otherwise, by writing it as: for any Σ_n-formula F,
∀(parameters), ∀n, (F(0)∧¬F(n)) ⇒ ∃k<n, (F(k)∧¬F(k+1))
but, trying to reduce it via a collection axiom, one way would use BΠ_n which is equivalent to BΣ_n+1 and stronger than IΣ_n, while the other reduction attempt, using the weaker Σ_n-collection (implied by IΣ_n), actually gives another formula of class Π_n+2, namely

∀n, (∀x, ¬F(0,x)) ∨ ∃x, (F(n,x) ∨ ∀b, ∃k<n, ∀y<b, (F(k,x)∧ ¬F(k+1, y)))
for every F of class Π_n−1.

Let us gather diverse formulas seen above.
◇_n(T) ⇔ RFN_{Σ_n}(T) ⇔ RFN_{Π_n+1}(T)
IΣ_n+1 ⇒ BΣ_n+1 ⇔ BΠ_n ⇔ I∆_n+1 ⇒ IΣ_n ⇔ IΠ_n
If n>0 then IΠ_n ⇔ ◇_n₊₁(EA).
The closure of Σ_n+1 and thus also of Π_n+1 by bounded quantification, requires BΠ_n.
Either class, Σ_n+1 resp. Π_n+2, involved in the definition of ◇_n₊₁, is precisely defined as made of the formulas with only n+1 (resp. n+2) alternated quantifiers ahead of a bounded formula. For formulas with additional bounded quantifiers to be provably equivalent to such, requires BΠ_n (resp. BΠ_n+1). But
◇_n₊₃(EA) ⇒ BΠ_n+1 ⇒ ◇_n₊₂(EA) ⇒ BΠ_n ⇒ ◇_n₊₁(EA)
So, the definition of the iterated sequence T_n₊₁^α from a theory T would be modified by the allowance of additional bounded quantifiers in its definition from Π_n+2 unless T⊢BΠ_n+1 ; the same goes with its definition from Σ_n+1 unless T⊢BΠ_n.
Actually (if I did not misunderstand), the axiom BΠ_n (unlike ◇_n₊₂(EA)), while not a fruit of Π_n+2-reflection from a theory T which did not have it, does not affect ◇_n₊₁^α(T) when added to T.
BΠ_n not only reduces Π_n+2-formulas of the form (∀x<a, ∃y, F(a,x,y)) to Σ_n+1 ones, but also therefore, for example, reduces Π_n+4-formulas of the form (∀z, ∃a, ∀x<a, ∃y, F(z,a,x,y)) into the class Π_n+2. So it lets the class Π_n+2 essentially contain (via provable equivalence) formulas with inserted bounded quantifiers compared to the standard format of Π_n+2 formulas (unbounded quantifiers starting with ∀ and switching n+1 times before a bounded formula), provided that these insertions only occur after the first switch.

Around cut elimination

For most purposes, the concept of provability in first-order logic can be considered unique and unambiguous. Yet, under EA-provable equivalence, it has two versions, only provably equivalent in the theory EA⁺ defined as EA₁¹, i.e. EA+◇₁(EA) (totality of the tetration operation).
The difference is on whether to accept, among deduction rules, a kind of deduction called free-cut, that is a kind of use of the cut rule. While the cut elimination theorem says that the use of the cut rule can be eliminated in proofs of tautologies (theorems independent of any axioms) at the cost of an increase in size, a variant of this, called free-cut elimination, says that free-cuts can be eliminated from proofs of theorems using any given axioms. Actually, the use (vs. non-use) of free-cuts to prove a formula F assuming a finite list (or conjunction) X of axioms, is inter-translatable with the use of cuts to prove from scratch that X implies F.

To say in short, the job of a free-cut is to make use of an intermediate formula not inherited from the given axioms, and thus possibly outside the class they belong to. Indeed one can notice that the sketch of provability concept underlying the above account of the completeness theorem, thus sufficing to prove anything provable, does not use any formula with a higher class than those of given axioms.

Namely, a proof from axioms at most Π_n, using free-cuts with formulas at most Π_n+1, can be reduced to a proof with formulas at most Π_n, but its size may grow exponentially through this reduction process. As this exponential growth of size occurs k times in the cut elimination process reducing the highest class of used intermediate formulas from Π_n+k to Π_n, it works in EA for each explicit value of k, and down to a free-cut free proof (assuming n to take an explicitly bounded value as well), but ◇₁(EA) is required to justify its extension to the general case of free-cut elimination on proofs without explicit bound for k.

The choice of definition for provability between both options (with or without free-cuts) may affect the definition of ◇_n^α. Precisely, ◇_n^α(EA) is unambiguous (both versions are EA-provably equivalent) whenever n>0 or α is a limit ordinal, but is affected by the choice in the rest of cases, as follows: the consistency with free-cuts, of any T extending EA by finitely many Π₁ axioms, is EA-provably equivalent to the free-cut free version of ◇₀¹(T), i.e. when |T|₁ = |EA|₁, one ◇₀ reflection step with free-cuts amounts to two steps without [A. Visser. Interpretability logic. In P.P. Petkov, editor, Mathematical Logic, pages 175–208. Plenum Press, New York, 1990.].

This may be intuitively understood in the following rough terms:

The version of provability without free-cuts, is the more natural (i.e. fundamental) one from a theoretical viewpoint ;
The allowance of free-cuts (which increases efficiency and thus convenience) in deductions from any finitely axiomatized arithmetical theory T, adds |EA|₀ to the 0-strength |T|₀ of its consistency formula (but leaves untouched the strength of a theory with an axiom schema making it exceed the strength of any finite sub-list of its axioms by more than |EA|₀).

This rough view is also manifested by the following combination of facts [S. R. Buss, Handbook of proof theory, chap. II , 4.3] [Wilkie and Paris 1987]:

The incompleteness theorem (no consistent theory can prove its own consistency) still holds in terms of free-cut free consistency, including for theories weaker than EA.
EA can prove the free-cut free consistency of weaker theories, such as I∆₀.
However, it cannot prove the "full" version of their consistency.

The completeness theorem for recursive theories (or only the elementarily definable ones ?) and the free-cut free version of provability, is found to work in the theory WKL₀^*, a version of second-order arithmetic accepting Weak König's Lemma as an axiom. The arithmetical consequences of WKL₀^* consist in the theory EA+BΣ₁. Precisely, any countable model of EA+BΣ₁ can be completed by a set P of subsets to form a model of WKL₀^*, though to construct of such a P requires an external perspective.
[Unifying the model theory of first-order and second-order arithmetic via WKL₀^*, Ali Enayat, Tin Lok Wong]
[Interpreting Weak König's Lemma using the Arithmetized Completeness Theorem, Tin Lok Wong]
Of course WKL₀^* is a very poor foundation for mathematics, as many basic theorems of algebra are equivalent over it to IΣ₁⁰, i.e. require the theory WKL₀.
[Stephen G. Simpson and Rick L. Smith, Factorization of polynomials and Σ₁⁰ induction]

The axiom BΣ₁ (unlike the stronger IΣ₁ in the same class Π₃) does not have Π₂ consequences, and in particular cannot help to prove ◇₁(EA).
[A conservation result concerning bounded theories and the collection axiom, S.R. Buss]

For the concern of the completeness theorem, the little difference made by the choice of version of provability is the following. A theory T may be consistent without free-cuts but become inconsistent with them. This means it really is consistent, letting unproblematic the existence of a model M of it, except only for the existence of a nonstandard number n in a model of EA+BΣ₁ not fitting ◇₁(EA), such that some Π_n formula cannot be interpreted in M.

This is also unsurprising, since only PA (much stronger than EA+BΣ₁) can ensure the interpretability of all Π_n formulas in any given countable system for every explicit value of n (for the point of the ability to rule out any inconsistency involving such formulas, as conflicting an accepted induction axiom), and still higher strength is needed to extend this to unspecified n, as we shall hint later.

Transfinite induction schemes

The equivalence between induction and reflection, can be generalized as an equivalence between transfinite induction and iterated reflection as follows (extrapolating the information from [Transfinite induction within Peano arithmetic, Richard Sommer]).

Denote TI(β, Γ) the schema of transfinite induction along a given ordinal β for formulas of a given class Γ (that we shall take as Σ_n or Π_n). It generalizes ordinary induction as IΓ ⇔ TI(ω, Γ), and is also reducible to a single formula using a truth formula.

For all n∈ℕ and all ordinals α, β such that ω^{ω^α}≤β<ω^{ω^α+1} (and there is more precisely a standard k∈ℕ such that β<ω^{ω^α}^.k),

◇_n+2^α(EA) ⇔ TI(β, Π_n+1) ⇔ TI(ω.β, Σ_n)

As ω.ω^{ω^α} = ω^{1+ω^α}, while 1+ω^α = (2 if α=0, and ω^α otherwise), TI(ω.β, Σ_n) can be simplified as TI(β, Σ_n) except that when α=0, the condition for TI(β, Σ_n) to keep equivalence with other above formulas becomes ω²≤β<ω^ω.
On the other hand,

ω≤β<ω² ⇒ (TI(β, Σ_n) ⇔ IΠ_n)

Also remember that the theory (EA+◇_n+2^α(EA)) = EA_n+2^α+1 satisfies
|EA_n+2^α+1|_n+2 = α+1
|EA_n+2^α+1|_n+1 = ω^α+1
|EA_n+2^α+1|_n = ω^{ω^α+1}thus can prove ◇_n^β(EA) (that is β iterated Π_n+1 or Σ_n reflection) for precisely all β<ω^{ω^α+1}. These are the same β for which it can prove TI(β, Π_n+1), and the same for TI(β, Σ_n). But this is somehow coincidental, in lack of direct implication either way (◇_n^β(EA) is only deducible from TI(β, Π_n+1) when β≥ω, and from TI(β, Σ_n) when β≥ω²).

Yet underlying ideas are that

Iterated reflection is only well-understood to preserve soundness when applied along orders which are known to be well-orders, which means, where transfinite induction holds;
The non-standardness of a model of arithmetic may start showing up in the form of a failure of transfinite induction along its interpretation of the definition of an ordinal, which was well-ordered in its standard interpretation.

Hyperations and the Veblen hierarchy

Let us introduce concepts from [Hyperations, Veblen progressions and transfinite iteration of ordinal functions, David Fernández-Duque, Joost J. Joosten, 2013] following a progressive introduction path I did not see elsewhere.

Consider the operation ω^β.α between ordinals α and β, as a β-indexed family of functions with argument α. Let us also consider a slight variant (ϕ^β), defined from it by

1+ϕ^β(α) = ω^β.(1+α)

so that ϕ¹ enumerates the class of all limit ordinals.

Both families have the following similar properties.

Each family is the hyperation (hyper-iteration) of its 1-member ϕ=ϕ¹, being determined by it as follows.

First, it is a weak hyperation, i.e.

Each ϕ^β is normal, i.e. strictly increasing and continuous
ϕ⁰ = Id
ϕ^α+β = ϕ^α⚬ϕ^β (right additivity)

But, it is not continuous: whenever β is a limit while α is a successor,

ϕ^β(α) > lim_γ_→β ϕ^γ(α)

This should be expected by lack of other formula for the progression of ϕ^β(α) for a fixed α>0 and a variable β≥ω: right additivity (unlike left additivity) does not express ϕ^β+1(α) using ϕ^β(α).

Instead of this, the basic definition of ϕ^β(α) for limit β, proceeds by first defining ω^β = lim_γ_→β ω^γ, namely

ϕ^β(0) = lim_γ_→β ϕ^γ(0) (L₀)

Then, comes its multiplication by 1+α. This can be understood as a recursion which adds ω^β at each step of incrementing α (while cases of limit α follow by continuity). These steps are themselves expressible as limits:

ϕ^β(α+1) = ϕ^β(α)+ω^β = lim_γ_→β (ϕ^β(α)+ω^γ)

Defining ẟ by β=γ+ẟ, also written ẟ=-γ+β,

ϕ^β(α+1) = lim_γ_→β (ϕ^γ(ϕ^ẟ(α))+ω^γ) = lim_γ_→β (ϕ^γ(ϕ^ẟ(α)+1)).

Notice that strict increase and right additivity suffice to ensure

ϕ^β(α) < ϕ^γ(ϕ^ẟ(α)+1) ≤ ϕ^γ(ϕ^ẟ(α+1)) = ϕ^β(α+1)

Now, the hyperation (ϕ^β) of ϕ=ϕ¹, is the minimal weak hyperations of ϕ.

Equivalently (as justified below), it is a weak hyperation where ϕ^β is defined for limit β as the continuous function fitting (L₀) and

ϕ^β(α+1) = lim_γ_→β ϕ^γ(ϕ^-γ+β(α)+1) (L)

Looking back at ϕ^α+β = ϕ^α⚬ϕ^β, when β and (equivalently) α+β are limits, the instance of (L) for ϕ^α+β is deducible from the one for ϕ^β. Thus, for the sake of defining hyperations by transfinite recursion from ϕ, it only adds information when stated for limit values which are additively irreducible, i.e. not expressible as a sum of two strictly lower ordinals. Such ordinals are those of the form ω^β for some β>0.

For these, right additivity gives
∀γ<β, ϕ^ω^{^γ}⚬ϕ^{ω^β} = ϕ^{ω^β}
which means all values of ϕ^{ω^β} are fixed points of all ϕ^ω^{^γ} for γ<β (and thus also fixed by all composites of these functions, which are all ϕ^ξ for ξ<ω^β written in Cantor normal form).
So, the minimality condition for (ϕ^β) among weak hyperations of ϕ, is that each ϕ^{ω^β} is surjective to the class of such fixed points (it is not skipping any of them). Thus, the family (ϕ^{ω^β}) coincides with the Veblen progression based on ϕ, written (ϕ_β) and defined by

ϕ₀=ϕ
∀β>0, ϕ_β enumerates the ordinals which are fixed points of all ϕ_γ for γ<β.

We can see its equivalence with (L₀)∧(L): for ϕ_β = ϕ^{ω^β}, (L) simplifies as

ϕ_β(α+1) = lim_γ_→_ω^β ϕ^γ(ϕ_β(α)+1)

In particular, the identity ω^β+1 = lim_n_→_ω ω^β.n gives

ϕ_β+1(α+1) = lim_n_→_ω ϕ^{ω^β.n}(ϕ_β+1(α)+1)

where by right additivity,

ϕ^{ω^β.n} = ϕ_β⚬...⚬ϕ_β

which means ϕ_β+1(α+1) is the next fixed point of ϕ_β after ϕ_β+1(α).
Likewise, ϕ_β+1(0) = lim_n_→_ω ϕ^{ω^β.n}(0).

While we introduced the concepts of hyperation and Veblen progression on cases which admit an explicit expression, they become more useful when no such expression is available. There is one main such use case, once again with two slightly different versions, respectively based on slightly different functions e and φ:

φ(ξ) = ω^ξ = 1+e(ξ)
e(ξ) = ϕ^ξ(0)
Their respective hyperations are related by
e^α(0) = 0
e^α(1+ξ) = (φ^α(ξ) if α is a limit; φ^α(1+ξ) otherwise).
In particular, φ_α(ξ)=e_α(1+ξ) if α>0.

The epsilon numbers (sequence of ordinals) enumerate the fixed points of φ:
ε_α = φ₁(α) = φ^ω(α)

More or less regular theories

Among subsystems of first-order arithmetic, we saw a diversity of possible theories whose classification can be approached by different ordinal strength measures; and even these measures together cannot perceive some differences, such as the option to add the collection axiom BΣ_n+1 when IΣ_n is postulated while IΣ_n+1 isn't.

For many purposes one may prefer to focus on theories whose proving power can be characterized by a minimal amount of data, especially their ordinal strength measures as defined above, giving them clear positions in reasonable pictures of the strength hierarchy. Such theories will be called regular. But diverse candidate criteria of regularity may be considered, and these may disagree (be not totally ordered by provable implication) about relatively less regular theories.

For theories of first-order arithmetic (seen as provability predicates on the class of first-order arithmetical formulas), a first notable candidate, is the condition of being equivalent to a theory T axiomatized by iterated reflection, i.e. T=EA+{◇_n^β(EA)| n∈ℕ ∧ β<α_n} for a given sequence of ordinals (α_n)_n_∈ℕ. Let us call this basic regularity. As previously explained, this range of theories is fully classified by the sequence of ordinal measures |T|_n, which is the smallest sequence of ordinals such that |T|_n = α_n if α_n is a multiple of ω^|T|_n₊₁, or otherwise the next such multiple (α_n + ω^|T|_n₊₁).

Regularity conditions on T will be expressed by a sequence of ordinals presented in either form (µ_n) or (ν_n) as follows:

|T|_n = ω^|T|_n₊₁ . µ_n = ϕ^|T|_n₊₁(ν_n)
µ_n = (ν_n if |T|_n+1=0; 1+ν_n otherwise).

They allow to re-express the growth of strengths by iterated reflection (|T_n^α|_n = |T|_n+ ω^|T|^_n^₊₁.α) as

|T_n^α|_n = ω^|T|^_n^₊₁.(µ_n+α) = ϕ^|T|_n₊₁(ν_n+α)

A basically regular theory fails to containing PA when for some m, |T|_m₊₁ =0, equivalently (∀n>m, α_n=0). Picking here the smallest such m, |T|_m = µ_m = ν_m = α_m >0, T can be expressed as: if m>1,

T = ((...(EA_m^ν_m)...)₁^ν₁)₀^ν₀

(or some other way if m=1).
In particular for m>1 the sub-theory EA+IΣ_m₋₁= EA_m¹ of PA has strengths
∀n≤m, |EA_m¹|_n = φ^m−n(1)

Yet one may still like to call "regular" a theory obtained from this by adding the next arithmetical collection axiom (BΣ_m), as we mentioned the minimal foundation for the completeness theorem (WKL₀^*) is such a theory.

A more selective regularity condition for such theories, consists in requesting all multiplicity coefficients µ_n to be finite.
The motivation for this, is to consider any acceptance of an infinitely iterated reflection, giving a new theory T_n^ω out of an old one T, as deserving to come with that of the formula ◇_n₊₁(T) of its justification principle : T_n^ω ⊂ T+◇_n₊₁(T) = T_n₊₁¹.
Similarly for its soundness formula : ◇_n^ω(T) better comes as a consequence of ◇_n(T_n₊₁¹).
This regularity has the following consequences:
µ_n<ω ⇒ |T|_n+1 = ω^|T|_n₊₁.µ_n +1 < ω^|T|^_n^₊₁.ω = φ(|T|_n+1+1)
∀p>0, (∀k<p, µ_n+k <ω) ⇒ |T|_n < φ^k(|T|_n_+p+1)

A more standard and still more selective regularity concept is defined relatively to a given degree: a theory T is said to be Π_n+1-regular, if it has the same Π_n+1 theorems as EA_n^α for a certain ordinal α; equivalently, ◇_n(T) ⇔ ◇_n^α(EA).

Basic regularity, introduced above, implies Π₁-regularity.

A possible defect of this definition of Π_n+1-regularity for n>0 is its dependence on EA as a choice of initial theory. This defect is avoided by the following alternative formulations

∀k<n, µ_k=1
∀k<n, ν_k=0

Assuming basic regularity, all 3 definitions are equivalent, denoting α=|T|_n, in either case:

If n>1, for the same value of α>0 (but equivalence fails for α=0)
If n=1, where α is modified by adding a constant on the left, which leaves it unchanged if it is infinite.

But, (|T|_n=0 ∧ ∀k<n, ν_k=0) would imply |T|₀=0 which is impossible for a theory expressing any arithmetic at all.
The formula µ_k=1 simplifies |T_k^α|_k as |T|_k.(1+α). Moreover,

(∀k<p, µ_n+k=1) ⇒ |T|_n = φ^p(|T|_n+p)
(∀k<p, ν_n+k=0) ⇒ |T|_n = e^p(|T|_n+p)

In particular, among basically regular theories T with |T|_n>0 where n>0, the Π_n+1-regularity condition is equivalent to |T|₀ = eⁿ(|T|_n).
If n>1, then EA+IΠ_n₋₁ = EA_n¹ is the weakest Π_n+1-regular theory. In the exploration of the strength hierarchy of theories, it often happens to find a theory T which is somehow canonical, in that it is the weakest theory satisfying a certain regularity condition. In this case, it is generally natural for the study of theories stronger than T, to focus on those also fitting that same regularity condition for which T is weakest.

The theory PA, essentially axiomatized by the schema of induction

PA = EA+{◇_n(EA)| n∈ℕ}

has all its strengths equal: ∀n∈ℕ, |PA|_n = lim_p_→_ω φ^p(1) = ε₀

It is the weakest theory satisfying all Π_n+1-regularity conditions (∀n, ν_n=0 ∧ µ_n=1).

Any theory with (∀n, µ_n<ω) and not containing PA, namely not containing IΣ_m for some m, is contained in it:
|T|_m₊₁ = 0 ⇒ |T|₀ < φ^m+1(1) < φ₀^ω(0) = ε₀

The whole topic of ordinal analysis started with a proof by Gentzen that

TI(ε₀, Σ₀) ⇒ ◇₀(PA)

and that ε₀ is minimal for this property because for each given α<ε₀, TI(α, Σ₀) is a theorem of PA.

Worm calculus

A finite composite of reflections ◇_n of different degrees n, is called a worm. For example

◇_i_j_k(T) :⇔ ◇_i(T+◇_j(T+◇_k(T))).

Yet successive reflective expansions of theories relate to worms in complicated manners: ◇_i((T_k¹)_j¹) develops as ◇_i(T+◇_k(T)+◇_j(T+◇_k(T))), so it is equivalent to ◇_i_j_k(T) except if j<k, in which case it is equivalent to ◇_i_k_j_k(T) instead. Further successions of reflective expansions also develop as worms with similar splits into multiple cases.

When j<k, ◇_i_j_k is not equivalent to any finite composite of reflective expansions, but to a transfinite one instead:

◇_j(T+◇_k(T)) ⇔ ◇_j^{φ₀^k}^{^−j(1)}(T)
◇_i_j_k(T) ⇔ ◇_i(T_j^{φ₀^k}^{^−j(1)+1})

Some worms are even only equivalent to combinations of transfinite reflective expansions with different degrees. For example

◇_{1 2 0 3}(T) ⇔ ◇₁((T₀^φ₀³⁽¹⁾)₁^ω) ⇔ (◇₁^ω(T)∧◇₀^φ₀³⁽¹⁾⁺^ω^{^ω}(T))

Conversely, let us explain how any transfinitely iterated reflection ◇_n^α(EA) along some α < ε₀, is equivalent to a worm, all of whose entries (degrees) are ≥n.

Reflections iterated along ordinal sums are reducible to composites as follows:

T_n^α+β = (T_n^α)_n^β

For all theories T, T' and all ordinals β,

(◇_n(T_n^β) ⇔ ◇_n(T')) ⇒ T_n^β+1 = T+◇_n(T')

Using Schmerl’s formula, if |T|_n₊₂=0 then for any ordinals α, β,

◇_n(T_n^β+^{ω^α}) ⇔ ◇_n((T_n^β)_n^{ω^α}) ⇔ ◇_n((T_n^β+1)_n₊₁^α)

where the boldface ⇔ denotes provable equivalence.

The following reasoning was based on a mistake and I am not sure how to fix it:
The implicit use of the concept of axiom schema in formulas of the form ◇_n(T_n^α) when α is a limit, can be reduced by writing a limit ordinal in the form β+ω^α, then applying the above reduction to it.
The same goes for ◇_n(T_n₊₁^α) (and so on) since it goes for ◇_n₊₁(T_n₊₁^α) and

(◇_n₊₁(T) ⇔ ◇_n₊₁(T')) ⇒ (◇_n(T) ⇔ ◇_n(T')).

If the new α and/or β is still a limit, further reduction proceeds in the same way, and this must end after a finite number of such reductions under the hypothesis α<ε₀.

Slow reflection

There are possible variants for the concept of regularity, where (∀k<n, µ_k=1) is "morally" kept while basic regularity is dropped, in the sense that for some k, |T|_k = φ₀(|T|_k+1) "morally" holds, but |T|_k may still take intermediate values between φ₀(α) and φ₀(α+1) for some ordinal α. This means |T|_k+1 need not always be an ordinal, but may take intermediate values in between consecutive ordinals.

For (k=0, n=1), this means ◇₁(T) expresses the existence (i.e. totality) of functions with intermediate speed between those expressed by ◇₁(U) and ◇₁(V) where U,V are theories such that |U|₁<|V|₁ are consecutive genuine ordinals; such a T is obtained by adding to U an axiom of existence of a function with intermediate speed between the fastest ones expressed inside U, and one expressed by ◇₁(U). Examples functions with intermediate growth rates are given by slow-growing hierarchies (which may give widely different ordinal numbers of intermediates between given speeds, depending on conventions for fundamental sequences), but I am not sure if they fit the present purpose.

Anyway, such a function has been especially used by [Samuel R. Buss, Handbook of proof theory, ch. II, 1.3] to define his theory S₂ in between I∆₀ and EA, though it also involved modified versions of the induction axioms.
The idea is to simulate a distinction between a range of "mere numbers" x, only allowed to multiply, and the wider range of their exponentials 2^x serving as ranges of sets p<2^x, by introducing an operation # between the latter where 2^x.y = 2^x#2^y. This operation # having intermediate strength between multiplication and exponentiation, is so used to let "mere numbers" be freely multiplied, but exponentiated only once.

In similar ways, functions can be defined with speeds in between other consecutive functions of the fast growing hierarchy, to provide intermediates between other theories, such as between EA and EA⁺.

In a similar vein (or not ?), a concept of "slow consistency" has been found, so that "Transfinite iterations of slow consistency generate a hierarchy of precisely ε₀ stages between PA and PA + Con(PA)" [Slow Reflection, Anton Freund].
(This number matches the gap from |PA|₀ = ε₀ to |PA+◇₀(PA)|₀ = ε₀.2)
We shall not use such concepts in the rest of this exposition.

ω-strength

For any basically regular arithmetical theory T, Let us analyze the possible limit behavior of its sequence of strengths |T|_n for large n.
Remember that

∀n∈ℕ, |T|_n₊₁ ≤ |T|_n = ϕ^|T|_n₊₁(ν_n)

with equality when ν_n = 0 ∧ |T|_n = e(|T|_n).
Indeed, ν_n = 0 ⇒ |T|_n= e(|T|_n₊₁), and by injectivity of e,

∀ξ, ξ=e(ξ) ⇔ e(ξ)=e(e(ξ))

As a decreasing sequence of ordinals, it must reach a minimum for some n∈ℕ:

∀k>n, |T|_k=|T|_n

As a fixed point of e, this minimum must be of the form e₁(ξ) = e^ω(ξ) for some ordinal ξ called the ω-strength of T and denoted |T|_ω:

∀k≥n, |T|_k = e^ω(|T|_ω)

This extends the validity of the previous formula

(∀k<p, ν_n+k=0) ⇒ |T|_n = e^p(|T|_n+p)

from finite values of p to the infinite value p=ω.

Let us check the condition for a kind of converse: can there be |T|_n = φ(|T|_n) but not (∀k>n, |T|_k=|T|_n) ?

This would mean for some n and some ξ<|T|_n,

|T|_n = ω^|T|_n = ω^ξ . µ_n

where µ_n>1 is additively indecomposable because |T|_n is. Thus µ_n = ω^α for some α>0, and

ω^ξ+^α = ω^|T|_n = |T|_n = ξ+α = α = ω^α = µ_n

This can be excluded by adopting the regularity condition

∀k, µ_k<ε₀

Like previously introduced strength measures of finite degrees, the ω-strength measure of arithmetical theories matches the count of steps of expansion by some kind of iterated reflection, now by full arithmetical soundness:

T_ω^α := T+{◇_n(T_ω^β) | n<ω, β<α}
|T_ω^α|_ω = |T|_ω+α

So, the expression of the soundness of T with respect to all arithmetical formulas, which we might abusively denote like a formula ◇_ω(T), is only an axiom schema {◇_n(T) | n<ω}, irreducible to any single formula in the language of PA. The logical system formed by all modalities ◇_n (n<ω) and ◇_ω, has been described in [Positive provability logic for uniform reflection principles, Lev Beklemishev, arXiv:1304.4396].

PA is the weakest theory fitting the whole sequence of Π_n+1-regularity conditions for finite n. Hence, this requirement may be taken as a regularity condition to expect when studying natural arithmetical theories T at least the strength of PA, i.e. with |T|_ω > 0. These are the T = EA_ω^α for any α>0, with ω-strength |T|_ω=α.
The strength measures of such theories at all finite degrees coincide, making their distinction superfluous :

∀n<ω, |EA_ω^α|_n = e^ω(α)

From arithmetic to hyperarithmetic

To better explore theories with strength well beyond that of PA, one needs to expand the language beyond that of first-order arithmetic. A quite long path in the strength hierarchy beyond PA will make use of the language of second-order arithmetic, but as an introduction, let us now review a kind of transition between both: some beginning of the strength hierarchy beyond PA with a language somehow in between those of first-order and second-order arithmetic. The concepts at stakes are roughly those of a branch of mathematics called hyperarithmetic.

Compared to first-order arithmetic, hyperarithmetic will generalize the concepts of Π_n and Σ_n classes of formulas, and thus expansions T_n^α of theories T by iterated reflection and their strength measures |T|_n, from finite values of the degree n to recursive ordinal values.

For the sake of formalization and formal expression of provability, we need to rely on first-order logic (where the completeness theorem works). But the language of first-order arithmetic, by its reliance on the framework of first-order logic, only natively admits formulas from classes Π_n and Σ_n for finite n, using n successive unbounded quantifier symbols followed by some bounded formula. So, to express the concepts of Π_α or Σ_α formulas when α is an infinite recursive ordinal, we need some kind of extended language. Such a formalization is offered in
[Reflection algebras and conservation results for theories of iterated truth, Lev D. Beklemishev, Fedor N. Pakhomov]
[Conservativity spectra and generalized Ignatiev model, Lev D. Beklemishev]
But that approach suffers some unnecessary complication by its way of treating differently the components of α = l+n where l is a limit and n is finite, namely relying on the framework of first-order logic to handle the variation of n by the explicit use of n alternated quantifiers, while the extended language is only used to handle the variations of l.
For simplicity we shall treat both components the same way as l, replacing the formal use of formulas with unlimited numbers of quantifiers by a simulation of them by fixed (small) numbers of quantifiers only.
Namely, variable formulas F with variable class Π_α will be handled by a truth predicate Tr(α,F) where α and F are encoded as numerical arguments to the symbol Tr; the use of quantifiers in F is moved to the axioms needed to let Tr play its role.

Classes of formulas with transfinite order are recursively defined as follows. For any limit ordinal α we define

Π_α = Σ_α = ⋃_β<α Π_β

which we shall formalize in the way of a disjoint union :

Π_α = ∐_β<α Π_β
(lim) ∀α(limit ordinal), ∀(β,F)∈Π_α , (Tr(α,(β,F)) ⇔ ((β ordinal<α)∧ Tr(β,F)))

Formally replacing the use of each F'=(β,F)∈Π_α by Tr(α,F') implies formally expressing Π_α as the class of formulas trivially written Tr(α, F') with a fixed first argument α, and a variable second argument F'=(β,F) where values of β stay <α. This class is actually closed (via provable equivalence) by Boolean combinations and bounded quantification, independently of any arithmetical collection axiom. This lets us work with it like with Π₀, and define from it Π_α+n and Σ_α+n for each finite n, in the same way that Π_n and Σ_n were defined from Π₀, namely by n alternated numerical quantifiers followed by a bounded formula accepting the use of Tr(α, ...). Of course, non-trivial formulas are needed to provide interesting ranges for the second variable of Tr, but some simplification is possible.

For example the class Σ_α₊₁ may be expressed as made of all formulas of the form

∃x∈Π_α, F(x)∧Tr(α, x)

where F ranges in the class of bounded arithmetical formulas, with one distinguished free variable and possible implicit others.

Actually the same definition works to express Σ_α₊₁ from Π_α for non-limit values of α as well. Indeed:

Some bounded formulas F(x) express that x is formed from a given Π_α formula G with only one free variable replaced by any closed term (while any other free variable originally intended for G was moved as a implicit free variable of F, which describes x as having closed terms giving those values in the place of those variables of G);
As previously explained, the truth predicate over Π_α itself belongs to Π_α for (only) non-limit values of α.

(When α is a limit, the truth predicate over Π_α only enters Π_α when a fixed bound is put on the range of its argument; and the equivalent Π_α formula depends on that bound.)
(Regardless whether α is a limit or not, the same definition with F in a wider range: either Σ₁ or Π_α, is reducible to the use of a mere bounded arithmetical F).

Let us specify the way to formalize this in first-order logic, relative to a fixed encoding of a certain range of ordinals with clear order and successor relations.
We need of course one axiom to define Tr(0, x) and the one (lim) defining truth for limit classes out of that for lower classes.
But to formalize the definition of truth for successor ordinals, a distinction must be done between two options that make an effective difference.
Let U(α) be the formula

∀F∈B1, Tr(Sα, F) ⇔ ¬∃x∈Π_α, Tr(0, F(x))∧Tr(α, x)

where B1 means the class of bounded arithmetical formulas with only 1 free variable, Sα means to encode the successor of α as an ordinal, and F(x) encodes the sentence obtained from F by replacing its free variable by an expression of the value of x.
So technically in this encoding, the role of Π_Sα is played by B1.

Now the first axiomatization, called UTB in the literature (in some other equivalent formalization), only accepts the schema of axioms U(α) for each explicit (standard) value of α. It is actually the natural choice, which leaves unchanged the strength (the class of arithmetical theorems) of any arithmetical theory to which it is added, and makes all previously introduced formulas relating strengths and iterated reflection smoothly extend to the transfinite, where ◇_α is equivalently defined as Σ_α-reflection or Π_α+1-reflection for any ordinal α.

A stronger axiomatization would involve formulas

C(α): (∀β, β<α ⇒ U(β))

Important facts are

◇_ω(EA) ⇒ IΠ_ω

(I did not see news whether the converse holds)
A more difficult result from the literature:

◇_ω+1(EA) ⇔ (IΠ_ω ∧ C(ω))

and just like with finite ordinals

◇_ω+n+2(EA) ⇔ IΠ_ω+n+1

Those formulas still work with EA replaced by EA+IΠ_k for any standard finite k, or of course any theory in between.
All this probably works as well with any limit ordinal instead of ω.

[Model Theory and Proof Theory of the Global Reflection Principle, Mateusz Łełyk]

Transfinite worm calculus:
([The Worm Calculus, Ana de Almeida Borges & Joost J. Joosten])
[Well-orders in the transfinite Japaridze algebra, David Fernández-Duque, arXiv:1212.3468]

Sorted references

For mention in introduction - potentialism

Sets and supersets
July 2015Synthese 193(6):1-33
DOI:10.1007/s11229-015-0818-x
Authors:
Toby Meadows
https://aura.abdn.ac.uk/handle/2164/8726
https://www.academia.edu/81998125/Sets_and_supersets?f_ri=821

Acceptable ideas from the list mentioned there: literalism, generality relativism, modal approaches, property approaches, Zermelo’s “meta-set theory”, Grothendieck universes.

Joel David Hamkins & Øystein Linnebo, The modal logic of set-theoretic potentialism and the potentialist maximality principles
https://arxiv.org/abs/1708.01644
http://jdh.hamkins.org/set-theoretic-potentialism/

Why Is the Universe of Sets Not a Set? Zeynep Soysal

Set-theoretic potentialism, Joel David Hamkins (slides)

Set-theoretic foundations, Penelope Maddy

Øystein Linnebo, Stewart Shapiro, Potentiality and indeterminacy in mathematics

The Potential Hierarchy of Sets, Øystein Linnebo

Context relativism

Everything, more or less, James Studd
https://jamesstudd.net/category/research/
"One way to address this challenge is to deploy a sophisticated relativist-friendly account of quantifier domain restriction due to Michael Glanzberg (2006). In his view, quantifiers exhibit two sorts of context sensitivity: the context supplies both a background domain associated with the determiner and a local contextual restriction attaching to the nominal. On this view, we are free to operate in contexts where no non-vacuous local restriction comes into play. Such locally unrestricted quantifiers may then range over the entire background domain. Nonetheless, Glanzberg does not accept that such locally unrestricted quantification achieves absolute generality. Suppose we begin with a background domain D⋆ . Glanzberg develops a pragmatic account of how we may exploit Russell’s paradox to shift to a wider background domain containing D⋆ ’s Russell set. This provides one way to make sense of the performative aspect of the Russell Reductio noted in Section 1.4."

Context and Unrestricted Quantification, Michael Glanzberg, in Absolute Generality, ed. A. Rayo and G. Uzquiano, Oxford University Press, 2006, pp. 45-74.
https://michaelglanzberg.org/home/research/

Hybrid-relativism and revenge, James Studd

Recursion theory

Foundations of Mathematics Stephen G. Simpson October 1, 2009

Without power set

What is the theory ZFC without power set?
Victoria Gitman, Joel David Hamkins, Thomas A. Johnstone
https://arxiv.org/abs/1110.2430

CH

Is the Dream Solution of the Continuum Hypothesis Attainable? Joel David Hamkins, Notre Dame Journal of Formal Logic Volume 56, Number 1, 2015

Can we resolve the Continuum Hypothesis? Shivaram Lingamneni December 7, 2017
https://philpapers.org/rec/LINCWR-3 https://cs.stanford.edu/people/slingamn/

RECENT PROGRESS ON THE CONTINUUM HYPOTHESIS (AFTER WOODIN) PATRICK DEHORNOY

ZF

Separating Hierarchy and Replacement, Randall Holmes

Multiverse

A multiverse perspective on the axiom of constructiblity, Joel David Hamkins

Gödel's program, John R. Steel, 2012

A reconstruction of Steel's multiverse project, Penelope Maddy & Toby Meadows

THE SET-THEORETIC MULTIVERSE JOEL DAVID HAMKINS
Association for Symbolic Logic, 2012 doi:10.1017/S1755020311000359

The Set-theoretic Multiverse : A Natural Context for Set Theory
https://philpapers.org/rec/DAVTSM-8
Annals of the Japan Association for Philosophy of Science Vol.19(2011)

Generically invariant set theory John R. Steel July 21, 2022

Large cardinals

https://en.wikipedia.org/wiki/Jensen%27s_covering_theorem
== Ordinal definability and the structure of large cardinals Gabriel Goldberg

https://en.wikipedia.org/wiki/Second-order_arithmetic#Projective_determinacy
ZFC + {there are n Woodin cardinals: n is a natural number} is conservative over Z2 with projective determinacy, that is a statement in the language of second-order arithmetic is provable in Z2 with projective determinacy iff its translation into the language of set theory is provable in ZFC + {there are n Woodin cardinals: n∈N}.
https://en.wikipedia.org/wiki/Axiom_of_projective_determinacy
PD follows from certain large cardinal axioms, such as the existence of infinitely many Woodin cardinals.

https://plato.stanford.edu/entries/large-cardinals-determinacy/

FOUNDATIONAL IMPLICATIONS OF THE INNER MODEL HYPOTHESIS, TATIANA ARRIGONI∗ AND SY-DAVID FRIEDMAN

Large Cardinals and the Iterative Conception of Set Neil Barton∗ 10 April 2019†
(a) Projective Determinacy (schematically rendered). (b) For every n < ω, there is a fine-structural, countably iterable inner model M such that M |= “There are n Woodin cardinals”.
= Are Large Cardinal Axioms Restrictive? Neil Barton∗ 24 June 2020†