1. First foundations of mathematics

Sylvain Poirier
http://settheory.net/

1.1. Introduction to the foundations of mathematics

Mathematics and theories

Mathematics is the study of systems of elementary objects, whose only nature is to be exact, unambiguous (two objects are equal or different, related or not; an operation gives an exact result...). Such systems are conceived independently of our usual world, even if many of them can resemble (thus be used to describe) diverse aspects of it. Mathematics as a whole can be seen as «the science of all possible worlds» of this kind (of exact objects).
Mathematics is split into diverse branches, implicit or explicit frameworks of any mathematical work, that may be formalized as (axiomatic) theories. Each theory is the study of a supposedly fixed system (world) of objects, called its model. But each model of a theory may be just one of its possible interpretations, among other equally legitimate models. For example, roughly speaking, all sheets of paper are systems of material points, models of the same theory of Euclidean plane geometry, but independent of each other.

Foundations and developments

Each theory starts with a foundation, that is the data of a list of pieces of description specifying what it knows or assumes of its model(s) (its kind or shape). This includes a list of formulas (statements) called axioms, expressing the required properties of models, i.e. selecting its accepted models as the systems where the axioms are true, from the whole range of possible systems where they can be interpreted.
Then, the study of a theory progresses by choosing some of its possible developments : new concepts and information about its models, resulting from its given foundation, and that we can add to it to form its next foundation.
In particular, a theorem of a theory, is a formula deduced from its axioms, so that it is known as true in all its models. Theorems can be added to the list of axioms of a theory without modifying its meaning.

Other possible developments (not yet chosen) can still be operated later, as the part of the foundation that could generate them is preserved. Thus, the totality of possible developments of a theory, independent of the order chosen to process them, already forms a kind of «reality» that these developments explore (before the Completeness theorem will finally show how the range of possible theorems precisely reflects the more interesting reality of the diversity of possible models).

There are possible hierarchies between theories, where some can play a foundational role for others. For instance, the foundations of several theories may have a common part forming a simpler theory, whose developments are applicable to all.
A fundamental work is to develop, from a simple initial basis, a more complete foundation endowed with efficient tools opening more direct ways to further interesting developments.

The cycle of foundations

Despite the simplicity of nature of mathematical objects, the general foundation of all mathematics turns out to be quite complex (though not as bad as a physics theory of everything). Indeed, it is itself a mathematical study, thus a branch of mathematics, called mathematical logic. Like any other branch, it is made of definitions and theorems about systems of objects. But as its object is the general form of theories and systems they may describe, it provides the general framework of all branches of mathematics... including itself.

And to provide the framework or foundation of each considered foundation (unlike ordinary mathematical works that go forward from an assumed foundation), it does not look like a precise starting point, but a sort of wide cycle composed of easier and harder steps. Still this cycle of foundations truly plays a foundational role for mathematics, providing rigorous frameworks and many useful concepts to diverse branches of mathematics (tools, inspirations and answers to diverse philosophical questions).

(This is similar to dictionaries defining each word by other words, or to another science of finite systems: computer programming. Indeed computers can be simply used, knowing what you do but not why it works; their working is based on software that was written in some language, then compiled by other software, and on the hardware and processor whose design and production were computer assisted. And this is much better than at the birth of this field.)
It is dominated by two theories: Each one is the natural framework to formalize the other: each set theory is formalized as a theory described by model theory; the latter better comes as a development from set theory (defining theories and systems as complex objects) than directly as a theory. Both connections must be considered separately: both roles of set theory, as a basis and an object of study for model theory, must be distinguished. But these formalizations will take a long work to complete, especially for this following last piece: Model theory and proof theory are essentially unique, giving a clear natural meaning to the concepts of theory, theorems and consistency of each theory.

1.2. Variables, sets, functions and operations

Let us start mathematics by to introducing some simple concepts from the founding cycle, which may seem self-sufficient. It is natural to start with a set theory not fully formalized as an axiomatic theory.
Let us first explain what is a set, then we will complete the picture with more concepts and explanations on the context of foundations (model theory) and its main subtleties (paradoxes).

Constants

A constant symbol is a symbol denoting a unique object, called its value. Examples: 3, Ø, ℕ. Those of English language are proper names and names with «the» (singular without complement).

Free and bound variables

A variable symbol (or a variable), is a symbol without a fixed value. Each possible interpretation gives it a particular value and thus sees it as a constant.
It can be understood as limited by a box, whose inside has multiple versions in parallel. The diverse «internal viewpoints», corresponding to the possible fixed values, may be thought of as abstract «locations» in the mathematical universe, while the succession of statuses of a symbol (as a constant, a free variable or a bound variable), can be seen as a first expression of the flow of time in mathematics: a variable is bound when all the diverse "parallel locations inside the box" (possible values) are past. All these places and times are themselves purely abstract, mathematical entities.

Ranges and sets

The range of a variable, is the meaning it takes when seen as bound: it is the «knowledge» of the totality of its possible or authorized values (seen in bulk: unordered, ignoring their context), that are called the elements of this range. This «knowledge» is an abstract entity that can encompass infinities of objects, unlike human thought. A variable has a range when it can be bound, i.e. when an encompassing view over all its possible values is given.
Any range of a variable is called a set.

Cantor defined a set as «a grouping into a whole of distinct objects of our intuition or our thought». He explained to Dedekind : «If the totality of elements of a multiplicity can be thought of as «simultaneously existing», so that it can be conceived as a «single object» (or «completed object»), I call it a consistent multiplicity or a «set».» (We expressed this «multiplicity» as that of values of a variable).
He described the opposite case as an «inconsistent multiplicity» where «admitting a coexistence of all its elements leads to a contradiction». But non-contradiction cannot suffice to generally define sets: the non-contradiction of a statement does not imply its truth (the opposite statement may be true but unprovable); facts of non-contradiction can be themselves unprovable (incompleteness theorem); and two separately consistent coexistences might contradict each other (Irresistible force paradox / Omnipotence paradox).

A variable is said to range over a set, when it is bound with this set as its range. Any number of variables can be introduced ranging over a given set, independently of each other and of other variables.
Systematically renaming a bound variable in all its box, into another symbol not used in the same context (same box), with the same range, does not change the meaning of the whole. In practice, the same letter can represent several separate bound variables (with separate boxes), that can take different values without conflict, as no two of them are anywhere free together to compare their values. The common language does this continuously, using very few variable symbols («he», «she», «it»...)

Functions

A function is any object f behaving as a variable whose value is determined by that of another variable called its argument; the argument has a range called the domain of f and denoted Dom f. Whenever its argument is fixed (gets a name, say x, and a value in Dom f), f becomes a constant (written f(x)).
In other words, f is made of the following data:

Operations

The notion of operation generalizes that of function, by admitting a finite list of arguments (variables with given respective ranges) instead of one. So, an operation gives a result (a value) when all its arguments are fixed. The number n of arguments of an operation is called its arity ; the operation is called n-ary. It is called nullary if n=0 (it is a constant), unary if n=1 (it is a function), binary if n=2, ternary if n=3...
Nullary operations are useless as their role is played by their unique value; we shall see how to construct those with arity > 1 by means of functions.

The value of a binary operation f on its fixed arguments named x and y (i.e. its value when its arguments are assigned the values of x and y), is denoted f(x,y). Generally, instead of symbols, the arguments are represented by the left and right spaces in parenthesis, to be filled by any expression giving them desired values.

1.3. Form of theories: notions, objects and meta-objects

The variability of the model

Each consistent theory assumes its model (interpretation) as fixed, but this is usually a mere «choice» of one model in a wide (infinite) range of other existing, equally legitimate models of the same theory; the model becomes variable when viewed by model theory. But, this «choice» and this «existence» of a model can be quite abstract. In details, the proof of the Completeness theorem, in the way it can work in all cases, will effectively «specify» a model in the range of possibilities, but this construction is not really explicit (involving an infinity of steps, where each step depends on an infinite knowledge). In these conditions, the assumption of fixation of a model may be called nonsense, but nevertheless constitutes the standard interpretation of mathematical theories.

Notions and objects

Each theory has its own list of notions, usually designated by common names, that are the kinds of variables used by the theory ; each model (interpretation of the theory) interprets each notion as a set that is the common range of all variables of this kind. For example, Euclidean geometry has the notions of «point», «straight line», «circle» and more. The objects of a theory in a model, are all the possible values of its variables (the elements of its notions) in this model.

One-model theory

When we discuss several theories T and systems M that may be models of those T, we are in the framework of model theory, with its notions of «theory» and «system» that are the respective kinds of the variables T and M. But when we focus the study on one theory (such as a set theory) with a supposedly fixed model, the variables T and M become fixed and disappear (they are not variables anymore, the choice of the theory and the model becomes implicit). So, the notions of theory and model disappear from the notions list too.
This fixation reduces the framework, from model theory, to that of one-model theory. A model of one-model theory, is a system [T,M] that combines a theory T with a model M of T.

On the diversity of logical frameworks

Before giving a theory T, we must specify its logical framework (its format, or grammar), that describes the admissible forms of contents for T, what such contents mean about M, and how their consequences can be deduced. This framework is given by the choice of a precise version of one-model theory, that describes T and interprets its claims.
We shall first describe two of the main logical frameworks in parallel. Theories in the most common framework of first-order logic, will be called here generic theories. Set theory will be expressed in is own special framework. More frameworks will be introduced in Part 3.
The most common logical frameworks except the special one of set theory, will manage notions as types (usually in finite number for each theory) classifying both variables and objects: each object will belong to only one type, the one of variables that can name it. For example, an object of Euclidean geometry may be either a point or a straight line, but the same object cannot be both a point and a straight line.

Examples of notions from various theories

Theory Kinds of objects (notions)
Generic theory Pure elements classified by types
Set theory Elements, sets, functions, operations, relations, tuples...
Model theory Theories, systems and their components (listed next line)
  One-model theory     Objects, symbols, types, structures, expressions (terms, formulas)...
Arithmetic Natural numbers
Linear Algebra Vectors, scalars...
Geometry Points, staight lines, circles...

Meta-objects

The notions of a one-model theory T1, normally interpreted in [T,M], classify the components of T («type», «symbol», «formula»...), and those of M («object», and tools to interpret T there). But the same notions (even if from a different version of one-model theory) can be interpreted in [T1, [T,M]], by putting the prefix meta- on them.

By its notion of «object», one-model theory distinguishes the objects of T in M among its own objects in [T,M], that are the meta-objects. The above rule of use of the meta prefix would let every object be a meta-object; but we will make a vocabulary exception by only calling meta-object those which are not objects: symbols, types (and other notions), structures, expressions...

Set theory only knows the ranges of some of its own variables, seen as objects (sets). But, seen by one-model theory, every variable of a theory has a range among notions, which are meta-objects only.

Components of theories

Once chosen a logical framework, the content of a theory (or its foundation, i.e. its initial content, describing the form of its intended models), consists in a choice of 3 successive lists of components, where those in each list are used to build those of the next list:

Set-theoretical interpretation

Any generic theory (and its model, if considered) can be inserted (translated) into set theory by converting its components into components of set theory. Let us present both the generic method which works for any generic theory, and the different (non-generic) method that is usually preferred for the case of geometry.

In all cases, abstract types become fixed variables (or new constant symbols) whose values are sets called interpreted types (the respective ranges of variables of each type). For geometry, both abstract types «Point» and «Straight line» become fixed variables P and L, respectively designating the set of all points, and the set of all straight lines.
The use of variable symbols will be left intact, taking values among some objects of set theory (but not all). While some objects of geometry, such as straight lines, are usually interpreted as sets (of points), the generic method only uses pure elements as objects (or we may be ambiguous by calling them elements even if they are not pure).

The generic method will also convert structure symbols into fixed variables. The interpretations of types and structure symbols (their values as fixed variables) will determine the model, as they are its main components. The model, which thus varies with these variables, is itself an object of set theory. This integrates all theories we need as parts of the same set theory, while gathering all their models inside a common model of set theory. This is why models of set theory will be called universes.

1.4. Structures of mathematical systems

The structures, interpreting the language of a theory (values of structure symbols in a set theoretical interpretation), relate the objects of diverse types to form the studied system. These structures let objects of each type play diverse roles in the system. According to these roles, objects may be interpreted as complex objects, in spite of their basic nature of pure elements.
Generic theories admit 2 kinds of structures (and thus of structure symbols): operators and predicates.

An operator is an operation between interpreted types. On the side of the theory before interpretation, each operator symbol comes with the data of its arity (or list of arguments seen as places around the symbol), the type of each argument (which will range over the interpreted type), and the unique type of all its values (results of the operation).
The constant symbols (or constants) of a theory are its nullary operator symbols.
Unary operators (that are functions) will be called here a functors (this should not be confused with the concept of functor in category theory).

The list of types is completed by the Boolean type, interpreted as the pair of elements 1 («true») and 0 («false»). A variable of this type (outside the theory) is called a Boolean variable.
A para-operator is a generalized operator allowing the Boolean type among its types of arguments and results.
A (logical) connective is a para-operator with only Boolean arguments and values.
A predicate is a para-operator with Boolean values and at least one argument but no Boolean one.

Structures of set theory

Let us start formalizing set theory with 3 primitive notions (sorts of objects) : elements (all objects), sets and functions. This formalization will be progressively developed as needed, with other notions that may be seen either as primitive or derived from the former, and other symbols contributing to give their roles to all kinds of objects. But this formalization work requires to ignore the above set theoretical interpretation of theories, as both links between formalized generic theories and set theory should not be confused (the set theoretical translation of theories is not yet applied to set theory itself, this will be discussed in 1.7).

One way for set theory to give sets their role is by the binary predicate ∈ : for any element x and any set E, we say that x is in E (or belongs to E, or is an element of E, or that E contains x) and write xE, to mean that x is among the values of a variable with range E.
Functions f play their role by two operators: the domain functor Dom, and the function evaluator, binary operator that is implicit in the notation f(x), with arguments f and x, giving the value of any function f at any element x in Dom f.

About ZFC axiomatic set theory

The Zermelo-Fraenkel axiomatic set theory (ZF, or ZFC with the axiom of choice) is a generic theory with only one type «set», one structure symbol ∈ , and axioms. It implicitly assumes that every object is a set, and thus a set of sets and so on, built over the empty set.
Specialists of mathematical logic chose it for their need of a powerful theory in an enlarged founding cycle, by which they can prove many difficult formulas or their unprovability. Then, authors of basic courses usually present set theory as a popularized or implicit version of the ZFC theory, admitted as the standard reference, as if this was necessary or obvious (as an intuititively motivated axiomatic system, historically selected for its consistency and the convenience of its results).
But for a start of mathematics, ZF(C) is not an ideal reference. Its axioms (descriptions of the universe that must assume the framework of model theory to make sense) would deserve more subtle and complex justifications than usually assumed. Ordinary mathematics, using many objects usually not seen as sets, are only inelegantly formalized on this basis. As the roles of all needed objects can anyway be indirectly played by sets, they did not require another formalization, but remained cases of discrepancy between the «theory» and the practice of mathematics.

Types in one-model theory

As one-model theory ignores the diversity of possible models, its notion of type can be formalized ignoring the more general (set theoretical) notion of «set of objects» (of which the interpreted types are particular cases), as just one meta-notion playing both roles of abstract types (in the theory) and of interpreted types (components of the model) : these roles are given by meta-functors, one from variables to types, and one from objects to types. A larger range of notions, the classes, will be introduced in 1.7.

Structures in one-model theory

Similarly, the role of «structures» (that is an operation between interpreted types, with Boolean values if a predicate) can be formalized by meta-structures (similar to the function evaluator).
Unlike interpreted types (all named by abstract types), our notion of structure in one-model theory will be larger than that of values of the symbols in the given language. For this, structures will be another meta-type with a meta-functor from symbols to structures. However, this mere formalization would leave the exact extension of this notion of structure undetermined.
Trying to conceive this range as that of «all operations between interpreted types» would leave unknown the source of knowledge of such a totality. This idea of totality will be formalized in set theory as the powerset (2.5.), but its meaning would still depend on the universe where it is interpreted (assumed to contain all wanted operations), far from our present concern for one-model theory.
Instead, we will fix in 1.5 the notion of structure as given by those definable by expressions.

1.5. Expressions and definable structures

Terms and formulas

Given the first two layers of a theory (a list of types and a language), an expression is a finite system of occurrences of symbols, that will define a value for each possible data of a model (system interpreting the given types and structure symbols, ignoring axioms) and of an interpretation of the free variables it contains (their values in the model).
Any expression is either a term or a formula: the values of terms will be objects, while formulas will have boolean values.
A ground expression, is an expression with no free variable (all its variables are bound) so that its value only depends on the model. The axioms of a theory are chosen among its ground formulas.

Let us sketch the general description of expressions (that will be formalized in Part 3).

An occurrence of a symbol in an expression is a place where it is written, for example «x + x» has two occurrences of x and one of +. Expressions may use symbols of the following kinds. Expressions are built successively. The first and simplest ones are made of just one symbol already having a value by itself: constants and variables are the first terms; the Boolean constants 1 and 0 are the simplest formulas.
The next expressions are then successively built as made of the following data: The root determines the type of values (thus decides if the expression is a formula or a term) and the format of the list (the number and types of entries). For para-operator symbols, this format is given by the list of arguments.
The list was empty for constants, that, as roots, formed expressions alone. Other para-operators and binders need a nonempty list, which requires a choice of display convention: Parenthesis can also be used, for each expression or subexpression, to distinguish the root by separating the subexpressions, as in (x+y)n.

Variable structures

Only few objects are usually named by the constant symbols in a given language. Any other object can be named by another symbol outside this language: a fixed variable. This may be differently interpreted depending on the theory we consider to be in:

Trying to generalize this, from simple objects (identifiable with all nullary operations that may exist between interpreted types), to other structures (with nonzero arity), we get the concept of a symbol of variable structure, but this escapes the above list of symbols allowed in expressions. Thus, the status for expressions using such a symbol must now be chosen between

As a theory in first-order logic cannot handle the full range of a variable structure with nonzero arity (unless all its arguments range over finite sets), it cannot express all properties of this range. Still some properties can be known as follows: if a formula with a symbol of variable structure (not defined) is formally proven, then it is true for all structures it might mean, no matter if they «can be found» or not. This concept of provable universal truth will be used as a rule of proof for ordinary variables (principle of Universal Introduction in 1.9) and in the formulation of the set generation principle (1.11).

Structures defined by expressions

Any theory can produce structures (operators or predicates, beyond those directly named in its language), as the operations between interpreted types, defined by the following data: Nullary operators (simple objects) are «defined» this way by the term made of a variable symbol seen as parameter. In a formalization of set theory, any function f is synonymous with the functor defined by the term «f(x)» with argument x and parameter f.

We can accept as a legitimate work inside the theory, to name such a structure by a symbol of variable structure, meant as an abbreviation of the expression with its parameters. As the variability of the symbol abbreviates the one of parameters, this variable structure can be bound in the theory as ranging over all structures defined by a common expression with all possible values of its parameters : this just abbreviates the act of binding these parameters.

Let us now fix one-model theory's notion of structure, and thus those of operator and predicate, as made of all those which can be reached in this way: defined by any expression with any possible values of parameters. As this involves the infinite set of all expressions, it escapes the abilities of the studied theory (which can only use one expression at a time) and is only accessible by the framework of one-model theory with its meta-notion of expression. Still this does not mean to exhaust the range of all operations (between interpreted types) which may exist in the universe of a set theoretical framework, as there may be undefinable structures.

Invariant structures

An invariant structure for a theory, is a structure defined without parameters (thus a constant one). Any structure named by a symbol in the language of a theory, is directly defined by it without parameter, and thus invariant for this theory. This distinction of invariant structures from other structures, generalizes the distinction between constants and variables, both to cases of nonzero arity, and to what is indirectly expressible by the theory instead of directly named in its language.
Invariant structures not yet in the language can be named by new symbols to be added there, preserving the deep meaning of the theory (meta-notions of structure, invariant structure, provability...). Such rules to develop a theory will be discussed in Part 3.

1.6. Logical connectives

We already saw the nullary connectives : the Boolean constants 0 (false) and 1 (true).
The binary connective of equality between Booleans is written ⇔ and called equivalence: AB is read «A is equivalent to B».

Let us present other useful connectives, with their properties true for all possible values of the Boolean variables (A, B, C, that can be replaced by formulas).

Negation

The only useful unary connective is the negation ¬, that exchanges Booleans (¬A is read «not A»):
¬1
¬0
¬(¬A)
⇔ 0
⇔ 1
A
It is often denoted by barring the root of its argument, forming with it another symbol with the same format:
xy
xE
(AB)
(AB)
⇔ ¬(x=y)
⇔ ¬(xE)
⇔ ¬(AB)
⇔ (A ⇔ ¬B))
(x is not equal to y)
(x is not an element of E)
(Inequivalence)

Conjunctions, disjunctions

The conjunction ∧ means «and», being true only when both arguments are true;
The disjunction ∨ means «or», being true except when both arguments are false.
Each of them is :
Idempotent
(AA) ⇔ A
(AA) ⇔ A
Commutative
(BA) ⇔ (AB)
(BA) ⇔ (AB)
Associative
((AB) ∧ C) ⇔ (A ∧ (BC))
((AB) ∨ C) ⇔ (A ∨ (BC))
Distributive over the other
(A ∧ (BC)) ⇔ ((AB) ∨ (AC))
(A ∨ (BC)) ⇔ ((AB) ∧ (AC))

The symmetry between them comes from the fact they are exchanged by negation:

(AB) ⇎ (¬A ∧ ¬B)
(AB) ⇎ (¬A ∨ ¬B)

Strings of conjunctions such as (ABC), abbreviate any formula with more parenthesis such as ((AB) ∧ C), which are equivalent to each other thanks to associativity ; and similarly for strings of disjunctions such as (ABC).
Asserting a conjunction of formulas amounts to successively asserting all these formulas.
The inequivalence ⇎ is also called «exclusive or» because (AB) can also be written ((AB) ∧ ¬(AB)).

Implication

The binary connective of implication ⇒ is defined as (AB) ⇔ ((¬A) ∨ B). It can be read «A implies B», «A is a sufficient condition for B», or «B is a necessary condition for A». Being true except when A is true and B is false, it expresses the truth of B when A is true, but no more gives information on B when A is false (as it is then true).
Moreover,
(AB) ⇎
(AB) ⇔
(A ∧ ¬B)
B ⇒ ¬A)
The formula ¬B ⇒ ¬A is called the contrapositive of AB.
The equivalence can also be redefined as
(AB) ⇔ ((AB) ∧ (BA)).
Thus, a proof of AB can be made of a proof of the first implication (AB), then a proof of the second one (BA), called the converse of (AB).

The formula (A ∧ (AB)) will be abbreviated as AB, which reads «A therefore B». It is equivalent to (AB), but indicating that it is deduced from the truths of A and (AB).

Negations transform the associativity and distributivity formulas of conjunctions and disjunctions, into various formulas with implications:
(A ⇒ (BC)) ⇔ ((AB) ⇒ C)
(A ⇒ (BC)) ⇔ ((AB) ∨ C)
(A ⇒ (BC)) ⇔ ((AB) ∧ (AC))
((AB) ⇒ C) ⇔ ((AC) ∧ (BC))
((AB) ⇒ C) ⇔ ((AC) ∧ (BC))
(A ∧ (BC)) ⇔ ((AB) ⇒ (AC))
Finally,
((AB) ∧ (AC)) ⇒ (BC)
((AB) ∧ (AC)) ⇒ (BC)

Chains of implications and equivalences

In a different kind of abbreviation, any string of formulas linked by ⇔ and/or ⇒ will mean the conjunction of all these implications or equivalences between adjacent formulas:

(ABC) ⇔ ((AB) ∧ (BC)) ⇒ (AC)
(ABC) ⇔ ((AB) ∧ (BC)) ⇒ (AC)
0 ⇒ AA ⇒ 1
A) ⇔ (A ⇒ 0) ⇔ (A ⇔ 0)
(A ∧ 1) ⇔ A ⇔ (A ∨ 0) ⇔ (1 ⇒ A) ⇔ (A ⇔ 1)
(AB) ⇒ A ⇒ (AB)

Axioms of equality

For any objects (or abbreviations of terms) x, y, any functor T and any unary predicate A,
x = y
x = y
x = x
T(x) = T(y)
(A(x) ⇔ A(y))
The last formula can also be written (A(x) ∧ x = y) ⇒ A(y), since the converse A(y) ⇒ A(x) can be deduced either by its contrapositive (replacing A by ¬A) or by the symmetry of equality (x = yy = x) obtained by taking as A(z) the formula (z = x).
It also gives for all x, y, z, (x = yy = z) ⇒ x = z. The formula (x = yy = z) will be abbreviated as x = y = z.

Provability

A proof of a formula A in a first-order theory T, is a finite model of proof theory, connecting A to some axioms of T.
We say that A is provable in T and write TA if there exists a proof of A in T.
Again in T, a refutation of a formula A is a proof of ¬A. If one exists (T⊢ ¬A), the formula A is called refutable (in T).
A formula is called undecidable (in T) if it is neither provable nor refutable.

If a formula is both provable and refutable then all are, which means that the theory is inconsistent:

(A ∧ ¬A) ⇔ 0
((TA)∧(T⊢ ¬A))⇔(T⊢ 0).

The theory T is called contradictory or inconsistent if T⊢ 0, otherwise it is called consistent. In an inconsistent theory, every formula is provable. Such a theory has no model.

Without trying to formalize what a proof is, we shall just write proofs naturally, usually as a succession of formulas, each visibly true thanks to previous ones and the above properties of connectives and equality. Natural language articulations may also appear, especially when dealing with quantifiers (1.9) and introducing symbols defined by expressions.
In particular, an equality between terms x = y allows to replace any occurrence of x by y in any expression without affecting the result; when a symbol is defined by a term, both are equal, thus can be substituted to each other in any expression. Axioms and other rules expressed with variable symbols (under universal quantifiers, see 1.9) can then be used replacing these variables by terms.
Provable formulas with a known proof (either by humans or by computers) can be differently named in usual language according to their importance: a theorem is more important than a proposition; either of them may be deduced from an otherwise less important lemma, and a corollary may be easily deduced from it.

1.7. Classes in set theory

The unified framework of theories

Attempts to formalize one-model theory in first-order logic, would fail to exclude infinitely large «expressions» and «proofs»: this needs an axiom in second-order logic, best expressible only after insertion into set theory (though this solution remains incomplete, as will be explained in Part 3). As the components of its model [T, M] are named there by free variables, their variability makes this the set-theoretical expression of model theory (which, together with the axiomatization of set theory, will complete the grand tour of the foundations of mathematics).

Now let T0 be the external copy of T, i.e. the theory (integrated in the formalism of set theory but not part of its universe of objects) whose components k (types, symbols, axioms) have copies as objects which are the truly finite components of T. Formally, T0 is made of the k such that «k» ∈ T, where the notation as a quote «k» abbreviates a ground term of set theory describing k as an object. By this correspondence, any (model-theoretical) model M of T is indirectly also a (set-theoretical) model of T0.

This gives a powerful framework for the interpretation of T0 in M, encompassing both previous (set-theoretical and model-theoretical) frameworks of interpretations of theories in models. Namely, all works (expressions, proofs and other developments) done in T0, have copies as objects in the system T (system of objects described by set theory as having the formal property of «being a theory»); meanwhile, the (set theoretical) model M of T0 is formally seen as a «model of T», in the model-theoretical sense formalized inside the same set-theoretical framework, as it belongs to the same universe as T.
However, the power of this interpretation comes with a cost in legitimacy: for an externally given theory T0 with infinitely many components, we cannot directly define a corresponding T, as a set cannot be formally defined as exactly made of the values of a infinite list of terms (which are only meta-objects). We can work starting from a T0 only if its infinities of components are produced by some rules, for getting T as defined by the same rules. Then, no matter if T0 is finite or not, the existence of a model M of T, reflecting the consistency of T as defined inside the universe, does not automatically result from the consistency of T0 (as seen outside the universe). These consequences of the completeness and incompleteness theorems will be explained later.

Classes, sets and meta-sets

For any theory, a class is just a unary predicate A reinterpreted as the set of objects where A is true, that is «the class of the x such that A(x)».
In particular for set theory, each set E is synonymous with the class of the x such that xE (defined by the formula xE with argument x and parameter E). However, this involves two different interpretations of the concept of set, that need to be distinguished as follows.

From now on, in the above unified framework, the theory to be used as T0, interpreted in the model M and studied as an object T, will be set theory itself (expressible as a generic theory as explained in 1.9 and 1.10). Thus, our above use of set theoretical concepts as the framework describing the surrounding universe, is now a copy of T0 but with a different interpretation, that will be distinguished by carrying the meta- prefix. Set theoretical concepts in M can be nicely reflected by their meta interpretation, but both should not be confused.

Instead of the generic representation of all objects of generic theories as pure meta-elements, the role of each object «set» of set theory will usually be played by the class (meta-set) of its elements; similarly, the role of functions will be played by the corresponding functors.
This way, any set will be a class, while any class is a meta-set of objects. But some meta-sets of objects are not classes (they cannot be defined by any formula with parameters); and some classes are not sets, such as the universe (class of all objects, defined by 1), and the class of all sets, according to Russell's paradox (1.8).

Definiteness classes

In set theory, all objects need to be accepted as «elements» that can belong to sets and be operated by functions (to avoid illimited divisions between sets of elements, sets of sets, sets of functions, mixed sets...). This might be formalized by keeping 3 types where each set would have a copy among elements (identified by a functor from sets to elements), and the same for functions. But it would not suffice to our set theory, that will need more notions beyond those of set and function. For this, our set theory will use classes instead of types as its notions. In particular, the notions of set and function will be classes named by predicate symbols:

Set = «is a set»
Fnc = «is a function»

In generic theories, the syntactic correction of an expression (that is implicit in the concept of «expression») ensures that it will take a definite value, for every data of a model with an interpretation of its free variables there. But in set theory, this may still depend on the values of its free variables.
So, an expression A (and any structure defined from it) will be called definite, if it actually takes a value for the given values of its free variables (arguments and parameters) in the model. This definiteness condition is itself an everywhere definite predicate, expressed by a formula we shall abbreviate here as dA, with the same free variables.
Classes are defined by definite unary predicates. The meta-domain of any unary structure A, is the class defined by dA, with the same argument and parameters, called its class of definiteness.
Expressions should be only used where they are definite, which will be done rather naturally. The definiteness condition of (xE) is Set(E). That of the function evaluator f(x) is (Fnc(f) ∧ x∈ Dom f).
But the definiteness of the last formula needs a justification, which will be given below.

Extended definiteness

A theory with partially definite structures can be formalized (translated) as a theory with one type and everywhere definite structures, keeping intact all expressions and their values wherever they are definite : models are translated one way by giving arbitrary values to indefinite structures (e.g. a constant value), and in the way back by ignoring those values. Thus, an expression with an indefinite subexpression may be declared definite if its final value does not depend on these extra values.

For all predicates A and B, let us give to AB and A ⇒ B the same definiteness condition (dA ∧ (A ⇒ dB)) (breaking, for A ∧ B, the symmetry between A and B, that needs not be restored). They will thus be seen definite (with respective values 0 and 1) if A is false and B is not definite.

This makes definite the definiteness conditions themselves, as well as dAA and dAA (extending A where it was not definite, by 0 and 1 respectively). This way, both formulas
A ∧ (BC)
(AB) ∧ C
have the same definiteness condition (dA ∧ (A ⇒ (dB ∧ (BdC)))).

For any class A and any unary predicate B definite in all A, the class defined by the (everywhere definite) predicate AB, is called the subclass of A defined by B.

1.8. Binders in set theory

The syntax of binders

The last kind of symbols forming expressions is the binders (binding symbols).
Such a symbol binds one (or more) variable(s) (say x), on an expression F which may use x as a free variable in addition to the free variables that are available (with a value) outside. Thus, it separates the «inside» subexpression F using x as free, from the «outside» where x is bound. Applied to the data of the symbol x and the expression F, it gives a value depending on the unary structure defined by F with argument x (but cannot generally give a whole faithful image of this unary structure, which may be too complex to be fully recorded as a mere object).
The syntax differs between set theory and generic theories, which manage the range differently. In generic theories, ranges are types, implicit data of quantifiers. But binders of set theory make their variable range over a set, that is an object designated by an additional argument (a space for a term not using the bound variable).

An expression is ground if all its variables are bound (their occurrences are contained by binders).

Let us review the main binders in set theory.

Definitions of functions by terms

The function definer ( ∋ ↦ ) binds a variable on a term, following the syntax (Ext(x)), where x is the variable, E is its range, and t(x) is the term (here abbreviated as a unary predicate with possible implicit parameters); it may be shortened as (xt(x)) when E is determined by the context. It is definite if t(x) is definite for all x in E; it takes then the functor defined by t and restricts its definiteness class to E, to give it as a function with domain E.
So it converts functors into functions, reversing the action of the function evaluator that converted functions into their role (meaning) as functors whose definiteness classes were sets. This uniformizes the functors (which were only indirectly available to be used as functions by their defining term that could have unlimited complexity) as a unique kind of objects.
In 1.10. we will formalize this tool and see it as a particular case of a more general principle for set theory.

Relations and set-builder symbol

A relation is an object similar to an operation but with Boolean values: it acts as a predicate whose arguments range over sets. But this does not need to be introduced as another kind of objects, as sets will suffice to play these roles.
For any unary predicate A definite in a set E, the subclass of E defined by A is a set (range of a variable x introduced as ranging over E, so that it can be bound, from which we select the values satisfying A(x)), thus a subset of E, written {xE | A(x)} (set of all x in E such that A(x)): for all y,

y ∈ {xE | A(x)} ⇔ (yEA(y))

The binder { ∈ | }, is named the set-builder: {xE | A(x)} binds x with range E on A. It will be used as a definer for unary relations on E, figured as subsets F of E, evaluated by ∈ as predicates (xF) with argument x. But these predicates are not only definite in E but in the whole universe, giving 0 outside E whose data is lost. This lack of operator Dom does not matter, as the domain E is usually determined by the context.
As the function definer (resp. the set-builder) records the whole structure defined by the given expression on the given set, it suffices to define any other binder on the same expression with the same domain, as made of a unary structure applied to its result (that is a function, resp. a set).

Russell's paradox

Of course, if the class of all objects was a set then the set builder could use it to also convert all classes into sets. But this is not possible: the class of all sets cannot be a set (no set E can ever contain all sets), as can be shown using the set builder itself:

Theorem. For any set E there is a set F such that FE.

Proof. F={xE | Set(x) ∧ xx} ⇒ (FF ⇔ (FEFF)) ⇒ (FFFE). ∎

This will oblige us to keep the distinctions between sets and classes.

1.9. Quantifiers

A quantifier is a binder taking a unary predicate (formula) and giving a Boolean value.
In set theory, the full syntax for a quantifier Q binding a variable x with range E on a unary predicate A, is

QxE, A(x)

where A(x) abbreviates the formula defining A, whose free variables are x and possible parameters.
A shorter notation puts the domain as an index (QEx, A(x)), or deletes it altogether (Qx, A(x)) when it may be kept implicit (unimportant, or fixed by the context, such as a type in a generic theory).
The two main quantifiers (from which others will be defined later) are: Their interpretation can be defined in a set theoretical framework considering A as a function, and treating its boolean values as objects:

(∀x, A(x)) ⇔ A = (x ↦ 1)
(∃x, A(x)) ⇔ A ≠ (x ↦ 0)
(∀x, A(x)) ⇎ (∃x, ¬A(x))

In set theory, (∀xE, A(x)) ⇔ {xE | A(x)} = E. The formula (∀x,1) is always true. With classes,

(∃C x, A(x))
(∀C x, A(x))
x, (C(x)
⇔ (∃x, C(x) ∧ A(x)) ⇔ ∃CA x, 1
⇔ (∀x, C(x) ⇒ A(x))
⇔ (∃C y, x=y))

Inclusion between classes

A class A is said to be included in a class B when ∀x, A(x) ⇒ B(x). Then A is a subclass of B, as ∀x, A(x) ⇔ (B(x) ∧ A(x)). Conversely, any subclass of B is included in B.
The inclusion of A in B implies for any predicate C (in cases of definiteness):

(∀B x, C(x)) ⇒ (∀A x, C(x))
(∃A x, C(x)) ⇒ (∃B x, C(x))
(∃C x, A(x)) ⇒ (∃C x, B(x))
(∀C x, A(x)) ⇒ (∀C x, B(x))

Rules of proofs for quantifiers on a unary predicate

Existential Introduction. If we have terms t, t′,… and a proof of (A(t) ∨ A(t′) ∨ …), then x, A(x).

Existential Elimination. If x, A(x), then we can introduce a new free variable z with the hypothesis A(z) (the consequences will be true without restricting the generality).

These rules express the meaning of ∃ : going from t, … to ∃ then from ∃ to z, amounts to letting z represent one of t, t′, … (without knowing which). They give the same meaning to ∃C as to its 2 above equivalent formulas, bypassing (making implicit) the extended definiteness rule for (CA) by focusing on the case when C(x) is true and thus A(x) is definite.

While ∃ appeared as the designation of an object, ∀ appears as a deduction rule: ∀C x, A(x) means that its negation ∃C x, ¬A(x) leads to a contradiction.

Universal Introduction. If from the mere hypothesis C(x) on a new free variable x we could deduce A(x), then C x, A(x).

Universal Elimination. If C x, A(x) and t is a term satisfying C(t), then A(t).

Introducing then eliminating ∀ is like replacing x by t in the initial proof.

Deductions can be made by these rules, reflecting formulas

((∀C x, A(x)) ∧ (∀C x, A(x) ⇒ B(x))) ⇒ (∀C x, B(x))
((∃C x, A(x)) ∧ (∀C x, A(x) ⇒ B(x))) ⇒ (∃C x, B(x))
(∀C x, A(x)) ∧ (∃C x, B(x))) ⇒ (∃C x, A(x) ∧ B(x))
(∃A x, ∀B y, R(x,y)) ⇒ (∀By, ∃A x, R(x,y))

Status of open quantifiers in set theory

Set theory is translated to a generic theory by converting to classes the domains of quantifiers:

(∃xE, A(x)) → (∃x, xEA(x))
(∀xE, A(x)) → (∀x, xEA(x))

Set theory only admits quantifiers over sets, called bounded quantifiers, in its formulas (also called bounded formulas for insistence) that define predicates and can be used in terms. But its translated form as a generic theory allows quantifiers on classes (or the universe), called open quantifiers.
Formulas with open quantifiers in set theory will be called statements. Their use will be essentially restricted to declarations of truth of ground definite statements. These will first be axioms, then theorems deduced from them.
Open quantifiers in statements will usually be expressed as common language articulations (naturally using the above rules of proofs) between their bounded sub-formulas written in set-theoretic symbols.

The set-builder K = {xE| A(x)} was defined by a statement (∀x, xK ⇔ (xEA(x))) but 1.11 will redefine it without open quantifier (except on its parameters).

1.10. Formalization of set theory

The inclusion predicate

The inclusion predicate ⊂ between two sets E and F, is defined by
EF ⇔ (∀xE, xF).
It is read: E is included in F, or E is a subset of F, or F includes E.
We always have EE (as it means: ∀x, xExE).
Implications chains also appear as inclusion chains:
(EFG) ⇔ (EFFG) ⇒ EG.

Translating the definer into first-order logic

The translation of set theory into a generic theory converts the function definer into an infinity of operator symbols: for each term t with one argument (and parameters), the whole expression (Ext(x)) is seen as the big name of a distinct operator symbol, whose arguments are E and the parameters of t. (Those where every subexpression of t without any occurrence of x is the only occurrence of a parameter, would suffice to define others). The same goes for the set builder, which will come as a particular case in 1.11.
This way, the axioms list of set theory can be regarded as the axioms list of the generic theory into which set theory is converted. All axioms which depend on an expression (a term for the definer, or a formula for the set builder) are schemas of axioms. (A schema of claims, i.e. axioms or theorems, is an infinite list of claims, usually obtained by replacing an extra structure symbol by any possible defining expression).

First axioms

x,
Fnc x,
¬(Set(x) ∧ Fnc(x))
Set(Dom x)
E, ∀(parameters),
SetE, ∀SetF,
Fnc(Ext(x)) for any term t
EFEE=F (Axiom of Extensionality).
The latter redefines equality between sets from their equivalence as classes (letting elements in bulk): EFE means that E and F have the same elements (∀x, xExF) so that for any predicate R,

(∀xF, R(x)) ⇔ (∀xE, R(x))

and similarly for ∃.

Axioms for functions. For any functor t (to be replaced by all terms that can define it), we have the following axioms, where the first one sums up the other 3 : for all values of parameters of t, any set E included in the definiteness class of t (this condition ∀xE, dt(x) is the definiteness condition for (Ext(x))), and all functions f and g,

f = (Ext(x)) ⇔ ( Dom f = E ∧ (∀xE, f(x) = t(x)))
Dom (Ext(x)) = E
xE, (Eyt(y)) (x) = t(x)
(Dom f = Dom g ∧ ∀x∈Dom f, f(x) = g(x)) ⇒ f = g

A general principle for the formalization of set theory

For any kind of meta-objects indirectly expressible and usable like objects in expressions, set theory will be enriched with tools to directly present them as objects. Namely, classes behaving as sets will be convertible into sets (1.11), and indirectly specified elements will become directly specified (2.4). But when the indirect expression of meta-objects (here, functors) may run over an infinity of possible expressions (here, any term), another reason is needed to see all these meta-objects as definite objects of a single kind (functions): the reason here is that the domains of these functors are sets, given as an argument to the definer. Otherwise, there cannot be a class of all functors, nor of all classes: naively trying to insert this in the same universe might lead to contradictions by reasonings like the proof of Russell's paradox.

Meta-objects behaving as objects with another role beyond sets and functions, will be represented as another kind of objects (operations, relations, tuples), and the conversion tools from roles (meta-objects) to objects will be completed by new conversion tools from objects into their roles. But this can be done inside the same set theory (just by developing it), as already present objects (sets or functions) can be found to naturally play the roles of these new objects. So, the new notions can be defined as classes of existing objects (that will offer their expressible features to the new objects), while the tools of interpretation and definition of objets (converting objects into their roles of meta-objects and inversely) are defined as abbreviations of some fixed expressions. Then, the only needed functors of conversion between objects playing the role of the same meta-object by different methods (expressions), will relate the different objects of old kinds that usefully represent in different ways the same object of the new kind.

Formalization of operations and currying

The notion of n-ary operation, objects acting as n-ary operators between n sets, would be formalized by: The notion of operation can be represented as a class of functions, in the following way called currying. The role of operation definer (binding n variables) is played by the succession of n uses of the function definer (one for each bound variable); and similarly as an evaluator, n uses of the function evaluator. For example (n=2), the binary operation f defined by the term (binary predicate) t with arguments x in E and y in F, may be formalized by

f ≈ (Ex ↦ (Fyt(x,y))) = g
f
(x,y) = g(x)(y) = t(x,y)

The intermediate function g(x) = (Fyt(x,y)) with argument y, sees x as fixed and y as bound. But this breaks the symmetry between arguments, and loses the data of F when E is empty but not the other way round. A formalization without these flaws will be possible using tuples (2.1.).

The formalization of n-ary relations involves an (n+1)-ary predicate of evaluation, and a definer binding n variables on a formula. We may either see n-ary relations as particular cases of n-ary operations (representing Booleans as objects), or see n-ary operations as particular cases of (n+1)-ary relations (see 2.3). And just like operations, relations can be reduced to the unary case in 2 ways : by currying (with n−1 uses of the function definer and 1 use of the set-builder, as will be done in 2.3 with n=2), or using tuples (2.1).

1.11. Set generation principle

Bounded quantifiers give sets their fundamental role as ranges of bound variables, unknown by the predicate ∈ which only lets them play a role of classes. Technically, the bounded quantifier (∃ ∈ , ) suffices to define the predicate ∈ as
xE ⇔ (∃yE, x=y)
but is not definable from it in return in the set theoretical formalism, as the inverse definition involves an open quantifier. Philosophically, the perception of a set as a class (ability to classify each object as belonging to it or not) does not provide its full perception as a set (the perspective over all its elements as coexisting).

Set generation principle. For any class C (defined by a given bounded formula with parameters), if the formula (∀x,C(x) ⇒ A(x)) expressing C on an undefined unary predicate A (an extra symbol of predicate used ignoring the possibility to replace it by a formula, as mentioned in 1.5) is proven equivalent to a bounded formula (here abbreviated as (Qx, A(x)) like a quantifier), then C is a set that can be named by a new operator symbol K to be added to the language of set theory, with arguments the (common) parameters of C and Q, and axiom :
(For all accepted values of parameters), Set(K) ∧ (∀xK, C(x)) ∧ (Qx, xK).

This equivalence between Q and ∀C is equivalently expressible as the following list of 3 statements, where the quantifier Q* defined by (Q*x, A(x)) ⇎ (QxA(x)) will be equivalent to ∃C:

(1) ∀x, (C(x) ⇔ Q*y, x=y) (in fact we just need ∀x, C(x) ⇒ Q*y, x=y)
(2) Qx, C(x)
(3) For all (undefined symbols of) unary predicates A and B, (∀x, A(x) ⇒ B(x)) ⇒ ((Qx, A(x)) ⇒ (Qx, B(x))).

Indeed these 3 properties are already known consequences of «Q=∀C ». Conversely,
((2) ∧ (3)) ⇒ ((∀C x, A(x)) ⇒ Qx,A(x))
((1) ∧ ∃C x, A(x)) ⇒ ∃y, (Q*x, x=y) ∧ (∀x, x=yA(x)) ∴ ((3) ⇒ Q*x,A(x)) ∎

(3) will often be immediate, by lack of any troubling occurrence of A in Q (inside a negation, an equivalence, or left of a ⇒), leaving us to verify (1) and (2).

Here are examples of such operator symbols (the first column is the generic abbreviation while others are the effective examples, and denoting D=Dom f) :

K {yE|B(y)} E Im f {y} {y,z}
dK xE, dB(x) Set(E) ∧ ∀xE, Set(x)
Fnc(f) 1
1
1
C(x) xEB(x) yE, xy yD, f(y)=x 0
x=y x=yx=z
Qx, A(x) xE, B(x) ⇒ A(x) yE, ∀xy, A(x) xD, A(f(x)) 1 A(y) A(y) ∧ A(z)
Q*x, A(x) xE, B(x) ∧ A(x) yE, ∃xy, A(x) xD, A(f(x)) 0
A(y) A(y) ∨ A(z)

The definition of the set K={xE | B(x)}, that was only expressed as a class, can also be written as the particular case of the above mentioned axiom, i.e.
Set(K) ∧ (∀xK, xEB(x)) ∧ (∀xE, B(x) ⇒ xK)
or more shortly as
Set(K) ∧ KE ∧ (∀xE, xKB(x))
with which the proof of Russell's paradox would be written
F={xE | Set(x) ∧ xx} ⇒ ((∀xE, xF ⇔ (Set(x) ∧ xx)) ∧ (FF ⇎ (Set(F) ∧ FF))) ⇒ FE

The functor ⋃ is the union symbol, and its axioms form the axiom of union.

The set Imf of values of f(x) when x ranges over Dom f, is called the image of f.
We define the predicate f:EF as
(f:EF) ⇔ (Fnc(f) ∧ Set(F) ∧ Domf = E ∧ ImfF)
that reads «f is a function from E to F ». A set F such that ImfF (i.e. ∀xDom f, f(x) ∈ F), is called a target set of f.
The more precise (Fnc(f) ∧ Dom f=EImf = F) will be denoted f:EF (f is a surjection, or surjective function from E to F, or function from E onto F).

The empty set ⌀ is the only set without element, and is included in any set E (⌀ ⊂ E).
Thus, (E=⌀ ⇔ E ⊂ ⌀ ⇔ ∀xE, 0), and thus (E ≠ ⌀; ⇔ ∃xE,1).
This constant symbol ⌀ ensures the existence of a set; for any set E we also get ⌀ = {xE | 0}.
As (Dom f=⌀ ⇔ Imf=⌀) and (Dom f=Dom g=⌀ ⇒ f=g), the only function with domain ⌀ is called the empty function.

We can redefine ∃ from the above in two ways: (∃xE, A(x)) ⇔ {xE | A(x)} ≠ ⌀ ⇔ (1 ∈ Im(ExA(x))).

For all x, {x,x}={x}. Such a set with only one element is called a singleton.
For all x, y we have {x, y}={y, x}. If xy, the set {x, y} with 2 elements x and y is called a pair.

Our set theory will later be completed with more symbols and axioms, either necessary (as here) or optional (opening a diversity of possible set theories).



Philosophical aspects of the foundations of mathematics

To complete our initiation to the foundations of mathematics, the following pages of philosophical complements (from this one to Concepts of truth in mathematics), will present an overview on some of the deepest features of the foundations of mathematics: their philosophical and intuitive aspects (much of which may be implicitly understood but not well explained by specialists, as such philosophical issues are not easily seen as proper objects of scientific works). This includes
These things are not necessary for Part 2 (Set Theory, continued) except to explain the deep meaning and consequences of the fact that the set exponentiation or power set (2.6) is not justifiable by the set generation principle. But they will be developed and justified in more details in Part 3 (Model Theory).

Intuitive representation and abstraction

Though mathematical systems «exist» independently of any particular sensation, we need to represent them in some way (in words, formulas or drawings). Diverse ways can be used, that may be equivalent (giving the same results) but with diverse degrees of relevance (efficiency) that may depend on purposes. Ideas usually first appear as more or less visual intuitions, then are expressed as formulas and literal sentences for careful checking, processing and communication. To be freed from the limits of a specific form of representation, the way is to develop other forms of representation, and exercise to translate concepts between them. The mathematical adventure itself is full of plays of conversions between forms of representation.

Platonism vs Formalism

In this diversity of approaches to mathematics (or each theory), two philosophical views are traditionally distinguished. Philosophers usually present them as opposite, incompatible belief systems, candidate truths on the real nature of mathematics. However, both views instead turn out to be necessary and complementary aspects of math foundations. Let us explain how.

Of course, human thought having no infinite abilities, cannot fully operate in any realistic way, but only in a way roughly equivalent to formal reasonings developed from some foundations ; this work of formalization can prevent the possible errors of intuition.
But a purely formalistic view cannot hold either because

Another reason for their reconciliation, is that they are not in any global dispute to describe the whole of mathematics, but their shares of relevance depends on specific theories under consideration.

The form of mathematical theories

The main useful logical frameworks for mathematical theories, from the weakest (less expressive) to the strongest (most expressive), except set theory, will be:

We described the general form of mathematical theories in sections 1.1, 1.3, 1.4, 1.5, 1.6, 1.8, 1.9 for the two most important frameworks : first-order logic, and the framework of set theory (which is specific, with definiteness classes, and binders depending on sets). The relation between both escapes the above hierarchical order as we have irreversible correspondences between them in both ways (not the inverse of each other):
Let us sum up our description: once chosen a formalism, a theory is specified by its content (vocabulary and axioms), as follows. Every theory is made of 3 kinds of components. The latter 2 kinds of components are finite systems made from the previous kind :

But for having any interest for most practical purposes, a theory should be such that we cannot build from this, any of the 4th kind of components:

Realistic vs. axiomatic theories in mathematics and other sciences

Interpretations of the word «theory» may vary between mathematical and non-mathematical uses (in ordinary language and other sciences), in two ways.

Theories may differ by their object and nature:
Theories may also differ by whether Platonism or formalism best describes their intended meaning :

A realistic theory aims to describe a fixed system given from an independent reality, so that any of its ground formulas (statements) will be either definitely true or definitely false as determined by this system (but the truth of a non-mathematical statement may be ambiguous, i.e. ill-defined for the given real system). From this intention, the theory will be built by providing an initial list of formulas called axioms : that is a hopefully true description of the intended system as currently known or guessed. Thus, the theory will be true if all its axioms are indeed true on the intended system. In this case, its logical consequences (theorems, deduced from axioms) will also be true on the intended model.
This is usually well ensured in pure maths, but may be speculative in other fields. In realistic theories outside pure mathematics, the intended reality is usually a contingent one among alternative possibilities, that (in applied mathematics) are equally possible from a purely mathematical viewpoint. If a theory (axioms list) does not fit a specific reality that pure mathematics cannot suffice to identify, this can be hopefully discovered by comparing its predictions (logical consequences) with observations : the theory is called falsifiable.

An axiomatic theory is a theory given with an axioms list that means to define the selection of its models (systems it describes), as the class of all systems where these axioms are true. This may keep an unlimited diversity of models, that remain equally real and legitimate interpretations. By this definition of what «model» means, the truth of the axioms of the theory is automatic on each model (it holds by definition and is thus not questionable). All theorems (deduced from axioms) are also true in each model.

In pure mathematics, the usual features of both possible roles of theories (realistic and axiomatic) automatically protect them from the risk to be «false» as long as the formal rules are respected.

Non-realistic theories outside pure mathematics (where the requirement of truth of theorems is not always strict, so that the concept of axiom loses precision) may either describe classes of real systems, or be works of fiction describing imaginary or possible future systems. But this distinction between real and imaginary systems does not exist in pure mathematics, where all possible systems are equally real. Thus, axiomatic theories of pure mathematics aim to describe a mathematical reality that is existing (if the theory is consistent) but generally not unique.
Different models may be non-equivalent, in the sense that undecidable formulas may be true or false depending on the model. Different consistent theories may «disagree» without conflict, by being all true descriptions of different systems, that may equally «exist» in a mathematical sense without any issue of «where they are».

For example Euclidean geometry, in its role of first physical theory, is but one in a landscape of diverse geometries that are equally legitimate for mathematics, and the real physical space is more accurately described by the non-Euclidean geometry of General Relativity. Similarly, biology is relative to a huge number of random choices silently accumulated by Nature on Earth during billions of years.

Realistic and axiomatic theories both appear in pure mathematics, in different parts of the foundations of mathematics, as will be presented in the section on the truth concepts in mathematics.
But let us first explain the presence of a purely mathematical flow of time (independent of our time) in model theory and set theory.

Time in model theory

The time order between interpretations of expressions

Given a model, expressions do not receive their interpretations all at once, but only the ones after the others, because these interpretations depend on each other, thus must be calculated after each other. This time order of interpretation between expressions, follows the hierarchical order from sub-expressions to expressions containing them.
Take for example, the formula xy+x=3. In order for it to make sense, the variables x and y must take a value first. Then, xy takes a value, obtained by multiplying the values of x and y. Then, xy+x takes a value based on the previous ones. Then, the whole formula (xy+x=3) takes a Boolean value (true or false).
But this value depends on those of the free variables x and y. Finally, taking for example the ground formula ∀x, ∃y, xy+x=3, its Boolean value (which is false in the world of real numbers), «is calculated from» those taken by the previous formula for all possible values of x and y, and therefore comes after them.
A finite list of formulas in a theory may be interpreted by a single big formula containing them all. This only requires to successively integrate (or describe) all individual formulas from the list in the big one, with no need to represent formulas as objects (values of a variable). This big formula comes (is interpreted) after them all, but still belongs to the same theory. But for only one formula to describe the interpretation of an infinity of formulas (such as all possible formulas, handled as values of a variable), would require to switch to the framework of one-model theory.

The metaphor of the usual time

I can speak of «what I told about at that time»: it has a sense if that past saying had one, as I got that meaning and I remember it. But mentioning «what I mean», would not itself inform on what it is, as it might be anything, and becomes absurd in a phrase that modifies or contradicts this meaning («the opposite of what I'm saying»). Mentioning «what I will mention tomorrow», even if I knew what I will say, would not suffice to already provide its meaning either: in case I will mention «what I told about yesterday» (thus now) it would make a vicious circle; but even if the form of my future saying ensured that its meaning will exist tomorrow, this would still not provide it today. I might try to speculate on it, but the actual meaning of future statements will only arise once actually expressed in context. By lack of interest to describe phrases without their meaning, we should rather restrict our study to past expressions, while just "living" the present ones and ignoring future ones.
So, my current universe of the past that I can describe today, includes the one of yesterday, but also my yesterday's comments about it and their meaning. I can thus describe today things outside the universe I could describe yesterday. Meanwhile I neither learned to speak Martian nor acquired a new transcendental intelligence, but the same language applies to a broader universe with new objects. As these new objects are of the same kinds as the old ones, my universe of today may look similar to that of yesterday; but from one universe to another, the same expressions can take different meanings.

Like historians, mathematical theories can only «at every given time» describe a universe of past mathematical objects, while this interpretation itself «happens» in a mathematical present outside this universe.
Even if describing «the universe of all mathematical objects» (model of set theory), means describing everything, this «everything» that is described, is only at any time the current universe, the one of our past ; our act of interpreting expressions there, forms our present beyond this past. And then, describing our previous act of description, means adding to this previous description (this «everything» described) something else beyond it.

The infinite time between interpretations of theories

As a one-model theory T' describes a theory T with a model M, the components (notions and structures) of the model [T,M] of T', actually fall into 3 categories: This last part of [T,M] is a mathematical construction determined by the combination of both systems T and M but it is not directly contained in them : it is built after them.
So, the model [T,M] of T', encompassing the present theory T with the interpretation of all its formulas in the present universe M of past objects, is the next universe of the past, which will come as the infinity of all current interpretations (in M) of formulas of T will become past.
Or can it be otherwise ? Would it be possible for a theory T to express or simulate the notion of its own formulas and compute their values ?

As explained in 1.7, some theories (such as model theory, and set theory from which it can be developed) are actually able to describe themselves: they can describe in each model a system looking like a copy of the same theory, with a notion of "all its formulas" (including objects that are copies of its own formulas). However then, according to the Truth Undefinability theorem, no single formula (invariant predicate) can ever give the correct boolean values to all object copies of ground formulas, in conformity with the values of these formulas in the same model.

The Berry paradox

This famous paradox is the idea of "defining" a natural number n as "the smallest number not definable in less than twenty words". This would define in 10 words a number... not definable in less than 20 words. But this does not bring a contradiction in mathematics because it is not a mathematical definition. By making it more precise, we can form a simple proof of the truth undefinability theorem (but not a fully rigorous one):
Let us assume a fixed choice of a theory T describing a set ℕ of natural numbers as part of its model M.
Let H be the set of formulas of T with one free variable intended to range over this ℕ, and shorter than (for example) 1000 occurrences of symbols (taken from the finite list of symbols of T, logical symbols and variables).
Consider the formula of T' with one free variable n ranging over ℕ, expressed as
FH, F(n) ⇒ (∃k<n, F(k))
This formula cannot be false on more than one number per formula in H, which are only finitely many (an explicit bound of their number can be found). Thus it must be true on some numbers.
If it was equivalent to some formula BH, we would get

n∈ℕ, B(n) ⇔ (∀FH, F(n) ⇒ (∃k<n, F(k))) ⇒ (∃k<n, B(k))

contradicting the existence of a smallest n on which B is true.
The number 1000 was picked in case translating this formula into T was complicated, ending up in a big formula B, but still in H. If it was so complicated that 1000 symbols didn't suffice, we could try this reasoning starting from a higher number. Since the existence of an equivalent formula in H would anyway lead to a contradiction, no number we might pick can ever suffice to find one. This shows the impossibility to translate such formulas of T' into equivalent formulas of T, by any method much more efficient than the kind of mere enumeration suggested above.
This infinite time between theories, will develop as an endless hierarchy of infinities.

Zeno's Paradox

Achilles runs after a turtle; whenever he crosses the distance to it, the turtle takes a new length ahead.
Seen from a height, a vehicle gone on a horizontal road approaches the horizon.
Particles are sent in accelerators closer and closer to the speed of light.
Can they reach their ends ?

Each example can be seen in two ways:

And in each example, a physical measure of the «cost» to approach and eventually reach the targeted end, decides its «true» interpretation, according to whether this cost would be finite or infinite, which may differ from the first guess of a naive observer.
But the world of mathematics, free from all physical costs and where objects only play conventional roles, can accept both interpretations.

Each generic theory is «closed», as it can see its model (the ranges of its variables) as a whole (that is a set in its set theoretical formulation): by its use of binders over types (or classes), it «reaches the end» of its model, and thus sees it as «closed». But any possible framework for it (one-model theory and/or set theory) escapes this whole.
As explained in 1.7, set theory has multiple possible models : from the study of a given universe of sets, we can switch to that of a larger one with more sets (that we called meta-sets), and new functions between the new sets.

As this can be repeated endlessly, we need an «open» theory integrating each universe described by a theory, as a part (past) of a later universe, forming an endless sequence of growing realities, with no view of any definite totality. This role of an open theory will be played by set theory itself, with the way its expressions only bind variables on sets (1.8).

Time in set theory

The expansion of the set theoretical universe

Given two universes UU', the universe U will be called standard in U', or a sub-universe of U', if its interpretation of set theoretical structures (values they give for all values of arguments inside U) coincides with their meta-interpretation (that of U'). Precisely, let us require the preservation of the following data: Thus, as structures in U' are fixed, U only needs to be specified as a meta-subset or class in U'. Let us call it a small sub-universe if it is more precisely a set (UU').
If we have 3 universes UU'U" where U' is standard in U", then we have the equivalence:
(U is standard in U') ⇔ (U is standard in U").
Thus the idea to consider the standardness of a universe as an absolute property, independent of the external universe in which it is described... provided that this external universe is itself standard. This does not formally define standardness as an absolute concept (which is impossible), but suggests that such a concept may make ideal sense.

Let us call standard multiverse any collection (range) of standard universes, where any 2 of them are small sub-universes of a third one. Let us say that the set theoretical universe expands when it ranges over a standard multiverse.
From any standard multiverse M, we can rebuild an external universe containing all its universes, defined as their union U=⋃M, where they are all standard. Indeed, any expression with free variables in U takes its meaning from at least one UM containing the values of all these variables, and thus where the expression can be interpreted. This U is still another specific standard universe, but it cannot belong itself to M, as its presence there would contradict the concept of multiverse which does not admit any biggest element. So, no single standard multiverse can ever contain all possible standard universes.

We can understand the intended sense of set theory as of a different kind than that of generic theories, in this way : Unlike standard universes, not all non-standard universes can be small sub-universes of some larger universe (itself non-standard), as extensions may be unable to preserve the power set operators. We may also have a multiverse of non-standard universes looking like a standard multiverse (its members are small sub-universes of each other), but whose union U (with the same structures, letting these universes appear standard), no more behaves as a good universe, as it no more satisfies the axiom schema of specification over formulas with open quantifiers (which is a necessary quality for a universe to be a small sub-universe of another one). Namely, there may be a set EU and a formula A such that
{xE|∀U y, A(x,y)}∉U.

Two universes will be called compatible if they can both be seen as sub-universes of a common larger universe. All standard universes are compatible with each other. So when 2 universes are incompatible, at least one of them is non-standard ; they may be both parts of a common larger universe, only by representing there at least one of them as non-standard.

Can a set contain itself ?

Let us call a set reflexive if it contains itself.
By the proof of Russell's paradox [1.8], the class (Set(x) ∧ xx) of non-reflexive sets, cannot coincide with a subset F of any set E, since this F would then be a non-reflexive set outside E, giving a contradiction. But can reflexive sets exist ? this is undecidable; here is why.

From a universe containing reflexive sets, we can just remove them all: these sets cannot be rebuilt from the data of their elements (since each one has at least an element removed from the universe, namely itself), but for the rest to still constitute a universe (model of set theory), we need to manage the case of all other sets that contained one (and similarly with functions):
Another way is to progressively rebuild the universe while avoiding them: each set appears at some time, formed as a collection of previously accepted or formed objects. Any set formed this way, must have had a first coming time: it could not be available yet when it first came as a collection of already available objects, thus it cannot be reflexive.
As the reflexivity of a set is independent of context, a union of universes each devoid of reflexive sets, cannot contain one either.

On the other hand, universes with reflexive sets can be created as follows:

Riddle. What is the difference between
Answer: the role of the set containing x but not y, played by y in the former universe, is played by x in the latter.

The absence of reflexive sets, is a special case of the axiom of foundation (or regularity), to which the above arguments of undecidability, will naturally extend. Its formulation will be based on the concept of well founded relation, that will follow the detailed study of Galois connections. But this axiom is just as useless as the sets it excludes.

The relative sense of open quantifiers

When the universe expands, the values of statements (first-order formulas, admitting open quantifiers) may change.
Of course, if a statement is formally provable from given axioms then it remains true in all universes satisfying these axioms; similarly if it is refutable (i.e. its negation is provable, so that it is false in all universes). But set theory does not give sense (a Boolean value) to undecidable ground statements, and similar non-ground statements (with open quantifiers and given values of free variables), as any given value would be relative to how things go «here and now» : if a universal statement (∀x, A(x) for a bounded formula A) is true «here», it might still become false (an x where A is false might be found) «elsewhere».

But if the value of an indefinite statement is relative to how things go «here», then the actual variability of this value between places (to motivate its indefiniteness status) remains relative to how things turn out to go «elsewhere». Namely, it is relative to a given range of possible coexisting «places» (universes) where the statement may be interpreted, that is a multiverse. But to coexist, these universes need the framework of a common larger universe U containing them all. In fact, the mere data of U suffices to essentially define a multiverse as that of «all universes contained in U». Or rather, 2 multiverses, depending on whether we admit all universes it contains, or only the standard ones.
A standard multiverse, as defined above. There, the variability of an existential statement (∃x, A(x)) for a bounded formula A, means the existence of universes U, U'U such that ∀xU, ¬A(x) but ∃xU', A(x). That is, A(x) only holds at some x outside U. We can get a U' such that UU' by taking any universe containing both U and the old U'. In particular, (∃x, A(x)) is also true in U (we may call this statement «ultimately true»). Intuitively, the x where A is true are out of reach of the theory : they cannot be formally expressed by terms, and their existence cannot be deduced from the given existence axioms (satisfied by U).
But since (∃x, A(x)) was not definitely true for the initially considered universe U actually unknown and expanding, its chances may be poor to become definitely true for U which is just another axiomatically described universe, that is unknown as well. So, when a statement aimed for U is indefinite, it may be varying when U expands, but it may also be that the very question whether it indeed varies (that can be translated as a question on U), remains itself an indefinite question as well. Just more truths can be determined for U than for U by giving more axioms to describe U than we gave for U.
The incompleteness theorem will imply that a formalization of this description of U (as the union of a standard multiverse, whose universes satisfy given axioms) is already such a stronger axiomatization, but also that neither this nor any other axiomatic theory trying to describe U (as some kind of ultimate standard universe), can ever decide (prove or refute) all ground statements in U; in particular, the question of the variability of a ground statement in the expanding U cannot be always decided either.

A multiverse of «all universes» no matter if they are standard or not. The completeness theorem will show that for any generic theory, the interpretation of the indefiniteness of a ground formula as variability of its boolean value in this multiverse, coincides with their formal undecidability. Strange things may happen there for an undecidable (∃x,A(x)), as the universes where it is true and those where it is false may be incompatible :

Intuitively, different possible universes with different properties do not necessarily "follow each other" in time, but can belong to separate and incompatible growth paths, some of which may be considered more realistic than others.

So, while the formal undecidability of a ground statement makes it automatically variable in any "multiverse of all universes", this still does not say how it goes for standard multiverses. In conclusion, the indefiniteness of statements should only be treated by avoidance, as a mere expression of ignorance towards the range of acceptable universes, partially selected by axioms, where they may be interpreted.

Interpretation of classes

Classes in an expanding universe

Unlike objects which can be compared by an equality symbol (that can be used in formulas), the meta-relation of equality between classes is as indefinite as the open ∀ since both concepts are definable from each other: Like with open quantifiers, this indefiniteness leaves us with both concepts of provable equality (or proven equality), and provable inequality, according to whether the statement of this equality (∀x, A(x) ⇔ B(x)) is provable or refutable.

Each universe U interprets each class C as a meta-set of objects P={xU |C(x)}, and sees it as a set when PU. This condition, that C has the same elements in U as an object (set) P also in U, is expressed by set theory in U as

P,∀x, C(x) ⇔ xP

or equivalently

E,∀x, C(x) ⇒ xE

since from such an E we can restore P as P={xE |C(x)}.
Otherwise (if PU), this P is arising to existence with U and will exist as a set in future universes (those which see U as a set).

From the perspective of an expanding universe, a class C (given as a formula with parameters), «is a set» (equal to P) if the part P = {xU |C(x)} that this formula defines in each U (formally depending on U), turns out to remain constant (the same set) during the expansion of U. More precisely, it is known as a set (proven equal to P) if we could prove this independence, i.e. refute the possibility for any object x outside the current universe (but existing in a larger universe), to ever satisfy C(x). On the other hand, a class C is not considered a set, if it remains eventually able to contain «unknown» or «not yet existing» objects (in another universe), that would belong to some future value of P, making P vary during the growth of U.

In a given expansion of U, the interpretation of the formal condition for C to be a set (∃E,∀x, C(x) ⇒ xE) in the union U of these U, means that in this growth, «there is a time after which P will stay constant». Compared to our last criterion to distinguish sets among classes in an expanding universe (the constancy of P), this one ignores any past variations, to focus on the latest ones (those between the largest universes, with size approaching that of the U where it is interpreted).
But the ideally intended standard multiverse, that is the range of U, would have to be itself a class instead of a set. Thus, both perspectives (a constant vs. a variable universe) alternatively encompass each other, endlessly along the expansion. Meanwhile, a class defined by some special formula may alternatively gain and lose the status of set ; but if in some growth range, «P perpetually alternates between variability and constancy», then it would ultimately not be constant there, thus C would not be a set. So the alternation of its status would end... if we stopped checking it at wrong places. But how ?

Concrete examples

A set: Is there any dodo left on Mauritius ? As this island is well known and regularly visited since their supposed disappearance, no surviving dodos could still have gone unnoticed, wherever they may hide. Having not found any, we can conclude there are none. This question, expressed by a bounded quantifier, has an effective sense and an observable answer.

A set resembling a class: Bertrand Russell raised this argument about theology: «If I were to suggest that between the Earth and Mars there is a china teapot revolving about the sun..., nobody would be able to disprove my assertion [as] the teapot is too small to be revealed even by our most powerful telescopes. But if I were to go on to say that, since my assertion cannot be disproved, it is intolerable presumption on the part of human reason to doubt it, I should rightly be thought to be talking nonsense.» The question is clear, but on a too large space, making the answer practically inaccessible. (An 8m telescope has a resolution power of 0.1 arcsec, that is 200m on the moon surface)

A class: the extended statement, «there is a teapot orbiting some star in the universe» loses all meaning: not only the size of the universe is unknown, but Relativity theory sees the remote events from which we did not receive light yet, as not having really happened for us yet either.

A meta-object: how could God «exist», if He is a meta-object, while «existence» can only qualify objects? Did apologists properly conceive their own thesis on God's «existence» ? But what are the objects of their faith and their worship ? Each monotheism rightly accuses each other of only worshiping objects (sin of idolatry): books, stories, beliefs, teachings, ideas, attitudes, feelings, places, events, miracles, healings, errors, sufferings, diseases, accidents, natural disasters (declared God's Will), hardly more subtle than old statues, not seriously checking (by fear of God) any hints of their supposed divinity.

A universal event: the redemptive sacrifice of the Son of God. Whether it would have been theologically equivalent for it to have taken place not on Earth but in another galaxy or in God's plans for the Earth in year 3,456, remains unclear.

Another set reduced to a class... The class F of girls remains incompletely represented by sets: the set of those present at that place and day, those using this dating site and whose parameters meet such and such criteria, etc. Consider the predicates B of beauty in my taste, and C of suitability of a relationship with me. When I try to explain that «I can hardly find any pretty girl in my taste (and they are often unavailable anyway)», i.e.

(∀F x, C(x) ⇒ B(x)) ∧ {xF | B(x)}≈Ø,

the common reaction is: «Do you think that beauty is the only thing that matters ?», i.e.

What,(∀xF, C(x) ⇔ B(x)) ????

then «If you find a pretty girl but stupid or with bad character, what will you do ?». Formally : (∃xF, B(x) ⇏ C(x) !!!). And to conclude with a claim of pure goodness: «I am sure you will find !», ( « ∃ ∃ xF, C(x)). Not forgetting the necessary condition to achieve this: «You must change your way of thinking».
... by the absence of God...
: F would have immediately turned into a set by the existence of anybody on Earth able to receive a message from God, as He would obviously have used this chance to make him email me the address my future wife (or the other way round).
... and of any substitute: a free, open and efficient online dating system, as would be included in my project trust-forum.net, could give the same result. But this requires finding programmers willing to implement it. But the class of programmers is not a set either, especially as the purpose of the project would conflict with the religious moral priority of saving God's job from competitors so as to preserve His salary of praise.

Justifying the set generation principle

Let Q* be the abbreviation as a quantifier symbol, of a bounded formula that uses an extra unary predicate symbol.
Assume that ¬(Q*y,0), and let C(x) defined as (Q*y, y = x). The hypothesis of the set generation principle means that we have a proof of (Q* ⇔ ∃C).
Let E be the range of all values ever taken by the argument y of R(y) when interpreting Q*y, R(y). This range may depend on implicit parameters of Q*y but does not depend on R. It must be a set because this formula only has definite, fixed means (variables bound to given sets, fixed parameters) to provide these values. We may also take as E another set that includes this range, such as any fixed universe (seen as a set in a larger universe) containing the values of all parameters, so that the formula can be interpreted there.
For any x, the value C(x) of Q* on the predicate (y ↦ (y = x)), can only differ (be true) from its (false) value on (y ↦ 0), if both predicates differ inside E, i.e. if x belongs to E :

C(x) ⇔ ((Q*y, y = x) ⇎ (Q*y,0)) ⇒ (∃yE, y=x ⇎ 0) ⇔ xE

thus C is a set. ∎
For classes satisfying the condition of the set generation principle (being indirectly as usable as sets in the role of domains of quantifiers), are they also indirectly as usable as sets in the role of domains of functions (before using this principle) ? Namely, is there for each such class a fixed formalization (bounded formulas with limited complexity) playing the roles of definers and evaluators for functions having these classes as domains ? The answer would be yes, but we shall not develop the justifications here.

Concepts of truth in mathematics

Let us review 4 distinct concepts of «truth» for a mathematical formula, from the simplest to the most subtle.

We first saw the relative truth, that is the value of a formula interpreted in a supposedly given model (like an implicit free variable, ignoring any difficulty to specify any example). In this sense, a given formula may be as well true or false depending on the model, and on the values of its free variables there.

Provability

Then comes the quality of being relatively true in all models of a given axiomatic theory, which coincides with provability in this theory, i.e. deduction from its axioms by the rules of first-order logic. Namely, there are known formal systems of proof for first-order logic, with known proof verification algorithms, universally applicable to any first-order theory while keeping this quality (ability to prove exactly all universally true formulas).
This remarkable property of first-order logic, together with the fact that all mathematics is expressible there (what is not directly there can be made so by insertion into set theory, itself formalized as a first-order theory), gives this framework a central importance in the foundations of mathematics : it reconciles Platonism and formalism, while giving a clear, natural sense to the concepts of «proof», «theorem» and «consistency». Different systems can do this job but all such formalisms, when correct, are equivalent to each other : any proof according to one is automatically convertible into a proof according to any other.

The completeness theorem ensuring this, first expressed as stating the existence of a model of any consistent first-order theory, will be proven by constructing these models out of the infinite set of all ground expressions in a language constructed from the theory (the language of the theory plus more symbols extracted from its axioms). As the set of all ground expressions in a language can itself be constructed from this language together with the set ℕ of natural numbers, the validity of this theorem only depends on the axiom of infinity, that is the existence of ℕ as an actual infinity, sufficient for all theories (ignoring the diversity of infinities in set theory).

However, these are only theoretical properties, assuming a computer with unlimited (potentially infinite) available time and resources, able to find proofs of any size. Not only the precise size of a proof may depend on the particular formalism, but even some relatively simple formulas may only have «existing» proofs that «cannot be found» in practice as they would be too long, even bigger than the number of atoms in the visible physical Universe (as illustrated by Gödel's speed-up theorem). Within limited resources, there may be no way to distinguish whether a formula is truly unprovable or a proof has only not yet been found.

To include their case, the universal concept of provability (existence of a proof) has to be defined in the abstract. Namely, it can be expressed as a formula of first-order arithmetic (the first-order theory of natural numbers with operations of addition and multiplication), made of one existential quantifier that is unbounded in the sense of arithmetic (∃ p, ) followed by a formula where all quantifiers are bounded, i.e. with finite range (∀x < (...), ...). For example, p may be an encoding of the proof, or the time needed by a proof searching algorithm to find it.

However, once given an arithmetical formula known to be a correct expression of the provability predicate (while all such formulas are provably equivalent to each other), it still needs to be interpreted.

Arithmetic truths

This involves the third concept of mathematical truth, that is the realistic truth in first-order arithmetic. This is the ideally meant interpretation of arithmetic: the interpretation of ground formulas of first-order arithmetic in «the true set ℕ of all, and only all, really finite natural numbers», called the standard model of arithmetic. But any axiomatic formalization of arithmetic in first-order logic is incomplete, in both following senses of the question: This incompleteness affects the provability predicate itself, though only on one side, as follows.
On the one hand, if the formula p(A) of provability of a formula A, is true, then it is provable: a proof of p(A) can in principle be produced by the following method in 2 steps:
  1. Find a proof of A (as it is assumed to exist);
  2. Process it by some automatic converter able to formally convert any proof of A into a proof that a proof of A exists.

On the other hand, it is not always refutable when false : no matter the time spent seeking in vain a proof of a given unprovable formula, we might still never be able to formally refute the possibility to finally find a proof by searching longer, because of the risk for a formula to be only provable by unreasonably long proofs.

In lack of any possible fixed ultimate algorithm to produce all truths of arithmetic, we can be interested with partial solutions: algorithms producing endless lists of ground arithmetic formulas with both qualities A natural method to progress in the endless (non-algorithmic) search for better and better algorithms for the second quality without breaking the first, consists in developing formalizations of set theory describing larger and larger universes beyond the infinity of ℕ, where properties of ℕ can be deduced as particular cases. Indeed, if a set theory T' requires its universe to contain, as a set, a model U of a set theory T, then the arithmetic formula of the consistency of T will be provable in T' but not in T, while all arithmetic theorems of T remain provable in T' if T' describes U as standard.

Set theoretical truths

The above can be read as an indispensability argument for our last concept of truth, which is the truth of set theoretical statements. To progress beyond logical deduction from already accepted ones, more set theoretical axioms need to be introduced, motivated by some Platonist arguments for a real existence of some standard universes where they are true; the validity of such arguments needs to be assessed in intuitive, not purely formal ways, precisely in order to do better than any predefined algorithm. Arguments for a given axiomatic set theory, lead to arithmetic conclusions :
  1. The claim of formal consistency of this set theory;
  2. The arithmetic theorems we can deduce in its framework

Both conclusions should not be confused :

But as the objects of these conclusions are mere properties of finite systems (proofs), their meaning stays unaffected by any ontological assumptions about infinities, including the finitist ontology (denying the reality of any actual infinity, whatever such a philosophy might mean). It sounds hard to figure out, then, how their reliability can be meaningfully challenged by philosophical disputes on the «reality» of abstractions beyond them (universes), just because they were motivated by these abstractions.
But then, the claim of consistency (1.), with the mere existence of ℕ, suffice to let models of this theory really exist (non-standard ones, but working the same as standard ones).

For logical deduction from set theoretical axioms to be a good arithmetic truth searching algorithm, these axioms must be : But for a collection of such axioms to keep these qualities when put together in a common theory, they need to be compatible, in the sense that their conjunction remains sound. Two such statements might be incompatible, either if one of them limits the size of the universe (thus it shouldn't), or if each statement (using both kinds of open quantifiers when written in prenex form) endlessly alternates between truth and falsity when the universe expands, in such a way that they would no more be true together in any standard universe beyond a certain size (their conjunction must not limit the size of the universe either). The question is, on what sort of big standard universes might good axioms more naturally be true together ?

A standard universe U' might be axiomatically described as very big by setting it a little bigger than another very big one U, but the size of this U would need a different description (as it cannot be proven to satisfy the same axioms as U' without contradiction), but of what kind ? Describing U as also a little bigger than a third universe and so on, would require the axioms to keep track of successive differences. This would rapidly run into inefficient complications with incompatible alternatives, with no precise reason to prefer one version against others.

The natural solution, both for philosophical elegance and the efficiency and compatibility of axioms, is to focus on the opposite case, of universes described as big by how much bigger they are than any smaller one (like how we conceived a ultimate universe as the union of a standard multiverse) : axioms must be

It is also convenient because such descriptions are indeed expressible by axioms interpreted inside the universe, with no need of any external object. Indeed, if a property was only expressible using an external object (regarding this universe as a set), we could replace it by describing instead our universe as containing a sub-universe of this kind (without limiting its size beyond it), and why not also endlessly many sub-universes of this kind, forming a standard multiverse: stating that every object is contained in such a sub-universe. This is axiomatically expressible using objects outside each of these sub-universes, but inside our big one; and such axioms will fit all 3 above qualities.

Finally, the properly understood meaning of set theory is neither axiomatic nor realistic, but a sort of fuzzy intermediate between both: its axioms aim to approach all 3 qualities (strong and open but still sound) selecting universes with the corresponding 3 qualities (big and open but still standard), but these qualities are all fuzzy to interpret, and any specific axioms list (resp. universe) only aims to approach them, while this quest can never end.
Fortunately, rather simple theories such as ZF, already satisfy these qualities up to a very high degree, describing much larger realities than usually needed. This is how a Platonic view of set theory (seeing the universe of all mathematical objects as a fixed, exhaustive reality) can work as a good approximation, though it cannot be an exact, absolute fact.

Alternative logical frameworks

The description we made of the foundations of mathematics (first-order logic and set theory), is essentially just an equivalent clarified expression of the widely accepted ones (a different introduction to the same mathematics). In section 3 will be presented other logical frameworks that are either restricted versions of first-order logic, or anyway naturally expressible in set theory. But other, more radically different frameworks (concepts of logic and/or sets), called non-classical logic, might be considered. Examples: We will keep classical logic in all following sections, ignoring such alternatives.


Set theory and foundations of mathematics
Next : 2. Set theory (continued)