1. First foundations of mathematics

Sylvain Poirier
http://settheory.net/

1.1. Introduction to the foundations of mathematics

What is mathematics

Mathematics is the study of systems of elementary objects, whose only considered nature is to be exact, unambiguous (two objects are equal or different, related or not; an operation gives an exact result...). Mathematics as a whole can be seen as «the science of all possible worlds» of this kind (of exact objects).
Mathematical systems are conceived as «existing» independently of our usual world or any particular sensation, but their study requires some form of representation. Diverse ways can be used, that may be equivalent (giving the same results) but with diverse degrees of relevance (efficiency) that may depend on purposes. Ideas may first appear as more or less visual intuitions which may be expressed by drawing or animations, then their articulations may be expressed in words or formulas for careful checking, processing and communication. To be freed from the limits or biases of a specific form of representation, is a matter of developing other forms of representation, and exercise to translate concepts between them. The mathematical adventure is full of plays of conversions between forms of representation, which may initiate us to articulations between mathematical systems themselves.

Theories

Mathematics is split into diverse branches according to the kind of systems being considered. These frameworks of any mathematical work may either remain implicit (with fuzzy limits), or formally specified as theories. Each theory is the study of a supposedly fixed system that is its world of objects, called its model. But each model of a theory may be just one of its possible interpretations, among other equally legitimate models. For example, roughly speaking, all sheets of paper are systems of material points, models of the same theory of Euclidean plane geometry, but independent of each other.

The word «theory» may take different meanings between mathematical and non-mathematical uses (in ordinary language and other sciences). A first distinction is by nature (general kind of objects); the other distinction, by intent (realism vs. formalism) will be discussed later.

Non-mathematical theories describe roughly or qualitatively some systems or aspects of the world (fields of observation) which escape simple exact description. For example, usual descriptions of chemistry involve drastic approximations, recollecting from observations some seemingly arbitrary effects whose deduction from quantum physics is usually out of reach of direct calculations. The lack of clear distinction of objects and of their properties induces risks of mistakes when approaching them and trying to infer some properties from others, such as to infer some global properties of a system from likely, fuzzy properties of its parts.

Pure mathematical theories, only describing exact systems, can be protected from the risk to be «false», by use of properly rigorous methods (formal rules) designed to ensure the preservation of exact conformity of theories to their models.

In between both, applied mathematical theories, such as theories of physics are also mathematical theories but the mathematical systems they describe are meant as idealized (simplified) versions of aspects of given real-world systems while neglecting other aspects; depending on its accuracy, this idealization (reduction to mathematics) also allows for correct deductions within accepted margins of error.

Foundations and developments

Any mathematical theory, which describes its model(s), is made of a content and is itself described by a logical framework. The content of a theory is made of components which are pieces of description (concepts and information, described in 1.3). A theory starts with a choice of foundation made of a logical framework and an initial version of its content (hopefully rather small, or at least simply describable). The components of this initial version are qualified as primitive.
The study of the theory progresses by choosing some of its possible developments : new components resulting from its current content, and that can be added to it to form its next content. These different contents, having the same meaning (describing the essentially same models), play the role of different presentations of the same theory. Any other possible development (not yet chosen) can still be added later, as the part of the foundation that could generate it remains. Thus, the totality of possible developments of a theory, independent of the order chosen to process them, already forms a kind of «reality» that these developments explore.

To express the properties of its models, the content of a theory includes a list of statements, which are formulas meant as true when interpreted in any model. Primitive statements are called axioms. Further statements called theorems are added by development to the content, under the condition that they are proven (deduced) from previous ones : this ensures them to be true in all models, provided that previous ones were. Theorems can then be used in further developments in the same way as axioms. A theory is consistent if its theorems will never contradict each other. Inconsistent theories cannot have any model, as the same statement cannot be true and false on the same system. The Completeness Theorem (1.9, 4.6) will show that the range of all possible theorems precisely reflects the more interesting reality of the diversity of models, which indeed exist for any consistent theory.
Other kinds of developments (definitions and constructions) which add other components beyond statements, will be described in 1.5, 4.8 and 4.9.

There are possible hierarchies between theories, where some can play a foundational role for others. For instance, the foundations of several theories may have a common part forming a simpler theory, whose developments are applicable to all.
A fundamental work is to develop, from a simple initial basis, a convenient body of knowledge to serve as a more complete "foundation", endowed with efficient tools opening more direct ways to further interesting developments.

Platonism vs Formalism

Mathematics, or each theory, may be approached in two ways (as further discussed in 1.9):

Many philosophers of mathematics carry obsolete conceptions of such views as forming a multiplicity of opposite beliefs (candidate truths) on the real nature of mathematics. But after examination, just remain these two necessary and complementary views, with diverse shares of relevance depending on topics :

By its limited abilities, human thought cannot directly operate in a fully realistic way over infinite systems (or finite ones with unlimited size), but requires some kind of logic for extrapolation, roughly equivalent to formal reasonings developed from some foundations ; this work of formalization can prevent possible errors of intuition. Moreover, mathematical objects cannot form any completed totality, but only a forever temporary, expanding realm, whose precise form is an appearance relative to a choice of formalization.

But beyond its inconvenience for expressing proofs, a purely formalistic view cannot hold either because the clarity and self-sufficiency of any possible foundation (starting position with formal development rules), remain relative: any starting point had to be chosen somehow arbitrarily, taken from and motivated by a larger perspective over mathematical realities; it must be defined in some intuitive, presumably meaningful way, implicitly admitting its own foundation, since any try to specify the latter would lead to a path of endless regression, whose realistic preexistence would need to be admitted.

The cycle of foundations

Despite the simplicity of nature of mathematical objects, the general foundation of all mathematics turns out to be quite complex (though not as bad as a physics theory of everything). Indeed, it is itself a mathematical study, thus a branch of mathematics, called mathematical logic. Like any other branch, it is full of definitions and theorems about systems of objects. But as its object is the general form of theories and systems they may describe, it provides the general framework of all branches of mathematics... including itself. And to provide the framework or foundation of each considered foundation (unlike ordinary mathematical works that go forward from an assumed foundation), it does not look like a precise starting point, but a sort of wide cycle composed of easier and harder steps. Still this cycle of foundations truly plays a foundational role for mathematics, providing rigorous frameworks and many useful concepts to diverse branches of mathematics (tools, inspirations and answers to diverse philosophical questions).

(This is similar to dictionaries defining each word by other words, or to another science of finite systems: computer programming. Indeed computers can be simply used, knowing what you do but not why it works; their working is based on software that was written in some language, then compiled by other software, and on the hardware and processor whose design and production were computer assisted. And this is much better than at the birth of this field.)

It is dominated by two theories:

Each one is the natural framework to formalize the other: each set theory is formalized as a theory described by model theory; the latter better comes as a development from set theory (defining theories and systems as complex objects) than directly as a theory. Both connections must be considered separately: both roles of set theory, as a basis and an object of study for model theory, must be distinguished. But these formalizations will take a long work to complete.

1.2. Variables, sets, functions and operations

Starting mathematics is a matter of introducing some simple concepts from the founding cycle, which may seem as self-sufficient as possible (while they cannot be absolutely so). A usual and natural solution is to start with a set theory not fully formalized as an axiomatic theory. 1.2 will do this very shortly, intuitively explaining the concepts of set and function. Then 1.3 will start introducing the main picture of foundations (model theory) by which set theory can be formalized, with its main subtleties (paradoxes).

Constants

A constant symbol is a symbol seen as denoting a unique object, called its value. Examples: 3, ⌀, ℕ. Those of English language usually take the form of proper names and names with «the» (singular without complement).

Free and bound variables

A variable symbol (or a variable), is a symbol which, instead having an a priori definite value, comes with the concept of possible values, or possible interpretations as taking a particular value. Each possibility gives it a role of constant. These possible values may as well be infinitely many, or only one or even none.
It can be understood as limited by a box, whose inside has multiple versions in parallel, articulating different viewpoints over it: More precisely with respect to given theories, fixing a variable means taking a free variable in a theory and more lengthily ignoring its variability, therefore simulating the use of the other theory obtained by holding this symbol as a constant.
The diverse «internal viewpoints», corresponding to each possible value seen as fixed, may be thought of as abstract «locations» in the mathematical universe, while the succession of views over a symbol (qualifying it as a constant, a free variable or a bound variable), can be seen as a first expression of the flow of time in mathematics: a variable is bound when all the diverse "parallel locations inside the box" (possible values) are past. All these places and times are themselves purely abstract, mathematical entities.

Ranges and sets

The range of a variable, is the meaning it takes when seen as bound: it is the «knowledge» of its considered totality of possible or authorized values (seen in bulk: unordered, ignoring their context), that are called the elements of this range. This «knowledge» is an abstract entity that can actually process (encompass) infinities of objects, unlike human thought. Any range of a variable is called a set.
A variable has a range when it can be bound, i.e. when an encompassing view over all its possible values is given. Not all variables of set theory will have a range. A variable without a range can still be free, which is no more an intermediate status between free and bound, but means it can take some values or some other values with no claim of exhausitivity.

Cantor defined a set as a «gathering M of definite and separate objects of our intuition or our thought (which are called the "elements" of M) into a whole». He explained to Dedekind : «If the totality of elements of a multiplicity can be thought of... as "existing together", so that they can be gathered into "one thing", I call it a consistent multiplicity or a "set".» (We expressed this "multiplicity" as that of values of a variable).
He described the opposite case as an «inconsistent multiplicity» where «admitting a coexistence of all its elements leads to a contradiction». But non-contradiction cannot suffice to generally define sets: the consistency of a statement does not imply its truth (i.e. its negation may be true but unprovable); facts of non-contradiction are often themselves unprovable (incompleteness theorem); and two separately consistent coexistences might contradict each other (Irresistible force paradox / Omnipotence paradox).

A variable is said to range over a set, when it is bound with this set as its range. Any number of variables can be introduced ranging over a given set, independently of each other and of other variables.
Systematically renaming a bound variable in all its box, into another symbol not used in the same context (same box), with the same range, does not change the meaning of the whole. In practice, the same letter can represent several separate bound variables (with separate boxes), that can take different values without conflict, as no two of them are anywhere free together to compare their values. The common language does this continuously, using very few variable symbols («he», «she», «it»...)

Functions

A function is an object f made of the following data: In other words, it is an entity behaving as a variable whose value is determined by that of another variable called its argument with range Dom f : whenever its argument is fixed (gets a name, here "x", and a value in Dom f), f becomes also fixed, written f(x). This actually amounts to conceiving a variable f where the "possible views" on it as fixed, are treated as objects x conceptually distinct from the resulting values of f. As we shall see later, such an entity (dependent variable) f would not be (viewable as) a definite object of set theory if its argument had no range, i.e. could not be bound (it would only be a meta-object, or object of model theory, that we shall call a functor in 1.4)

Operations

The notion of operation generalizes that of function, by admitting a finite list of arguments (variables with given respective ranges) instead of one. So, an operation gives a result (a value) when all its arguments are fixed. The number n of arguments of an operation is called its arity ; the operation is called n-ary. It is called nullary if n=0 (it is a constant), unary if n=1 (it is a function), binary if n=2, ternary if n=3...
Nullary operations are useless as their role is already played by their unique value; 2.3 will show how to construct those with arity > 1 by means of functions.

The value of a binary operation f on its fixed arguments named x and y (i.e. its value when its arguments are assigned the fixed values of x and y), is denoted f(x,y). Generally, instead of names, the arguments are pictured by places around the symbol, namely the left and right spaces in parenthesis, to be filled by any expression giving them desired values.

An urelement (pure element) is an object not playing any other role than that of element: it is neither a set nor a function nor an operation.

1.3. Form of theories: notions, objects and meta-objects

The variability of the model

Each theory describes its model as a fixed system. But from the larger viewpoint of model theory, this is a mere «choice» of one possible model (interpretation) in a wide (usually infinite) range of other existing, equally legitimate models of the same theory. Now this fixation of the model, like the fixation of any variable, is but the elementary act of picking any possibility, ignoring any issue of how to specify an example in this range. Actually these «choice» and «existence» of models can be quite abstract. In details, the proof of the Completeness theorem will effectively «specify» a model of any consistent theory for the general case, but its construction will involve an infinity of steps, where each step depends on an infinite knowledge. Regardless this difficulty, the attitude of implicitly fixing a model when formally studying any mathematical theory, remains the standard way of interpreting it (except somehow for set theory as explained later).

Notions and objects

Each theory has its own list of notions, usually designated by common names, formally serving as the kinds of variables it can use ; each model interprets each notion as a set that is the common range of all variables of this kind. For example, Euclidean geometry has the notions of «point», «straight line», «circle» and more, and is usually expressed using a different style of variable symbol for each. The objects of a theory in a model, are all possible values of its variables of all kinds (the elements of all its notions) in this model.

One-model theory

Any discussion on several theories T and systems M that may be models of those T, takes place in model theory, with its notions of «theory» and «system» that are the respective kinds of the variables T and M. But when focusing on one theory with a fixed model, the variables T and M now fixed disappear from the list of variables. Their kinds, the notions of theory and model, disappear from the notions list too. This reduces the framework, from model theory, to that of one-model theory.
A model of one-model theory, is a system [T,M] which combines a theory T with a model M of T.

The diversity of logical frameworks

The role of a logical framework, as a precise version of (one-)model theory with its associated proof theory, is to describe : Here are those we shall see, roughly ordered from the poorest to the most expressive (though the order depends on the ways to relate them): We shall first describe the main two of them in parallel. First-order logic is the standard version of model theory, describing first-order theories we shall also call here generic theories. Set theory, which can encompass all other theories, can also encompass logical frameworks and thus serve itself as the ultimate logical framework as will be explained in 1.B.
Most frameworks manage notions as types (usually in finite number for each theory) classifying both variables and objects. Notions are called types if each object belongs to only one of them, which is then also called the type of the variables that can name it. For example, an object of Euclidean geometry may be either a point or a straight line, but the same object cannot be both a point and a straight line. But set theory will need more notions beyond types: classes, which will be introduced in 1.7.

Examples of notions from various theories

Theory Kinds of objects (notions)
Generic theories Urelements classified by types to play different roles
Set theory Elements, sets, functions, operations, relations, tuples...
Model theory Theories, systems and their components (listed below)
 One-model theory    Objects, symbols, types or other notions, Booleans,
structures (operators, predicates), expressions (terms, formulas)...
Arithmetic Natural numbers
Linear Algebra Vectors, scalars...
Geometry Points, straight lines, circles...

Meta-objects

The notions of a one-model theory T1, normally interpreted in [T,M], classify the components of T («type», «symbol», «formula»...), and those of M («object», and the means to interpret components and expressions of T there). But the same notions (even if from a different logical framework) can be interpreted in [T1, [T,M]], by putting the prefix meta- on them.

By its notion of «object», one-model theory distinguishes the objects of T in M among its own objects in [T,M], that are the meta-objects. The above rule of use of the meta prefix would let every object be a meta-object; but we will make a vocabulary exception by only calling meta-object those which are not objects: symbols, types or other notions, Booleans, structures, expressions...

Set theory only knows the ranges of some of its own variables, seen as objects (sets). But, seen by one-model theory, every variable of a theory has a range among notions, which are meta-objects only.

Components of theories

In a given logical framework, the content of a theory consists in 3 lists of components of the following kinds, where those of each of the latter two kinds are finite systems using those of the previous kind.

1.4. Structures of mathematical systems

The structures, interpreting the language of a theory, relate the objects of diverse types, giving their roles to the objects of each type with respect to those of other types, to form the studied system. According to these roles, objects may be thought of as complex objects, in spite of have otherwise no nature like urelements.
The kinds of structures (and thus the kinds of structure symbols) allowed in first-order theories, thus called first-order structures, will be classified into operators and predicates. We shall describe them as operations designated by structure symbols in a set theoretical interpretation. More powerful structures called second-order structures will be introduced in 5.1, coming from set theoretical tools or as packs of an additional type with first-order structures.

Set-theoretical interpretations

Any generic theory can be interpreted (inserted, translated) in set theory by converting its components into components of set theory. This is the usual view of ordinary mathematics, studying many systems as «sets with relations or operations such that...», with possible connections between these systems. Let us introduce both the generic interpretations applicable to any generic theory, and other ones usually preferred for particular theories.

Any interpretation converts each abstract type into a symbol (name) designating a set called interpreted type (serving as the range of variables of that type, whose use is otherwise left intact). This symbol is usually a fixed variable in the generic case, but can be accepted as constant symbol of set theory in special cases such as numbers systems (ℕ, ℝ...).
In generic interpretations, all objects (elements of interpreted types) are urelements, but other kinds of interpretations called standard by convention for specific theories may do otherwise. For example, standard interpretations of geometry represent points by urelements, but represent straight lines by sets of points.

Generic interpretations will also convert structure symbols into fixed variables (while standard ones may define them using the language of set theory). Any choice of fixed values of all types and structure symbols, defines a choice of model. Once the language is seen as a set, models become objects of set theory, owing their multiplicity to the variability of types and structure symbols. This integrates all needed theories into the same set theory, while gathering representatives of all their considered models inside a common model of set theory. This is why a model of set theory is called a universe. When adopting set theory as our conceptual framework, this concept of "interpretation" becomes synonymous with the choice (designation) of a model.

First-order structures

An operator is an operation between interpreted types. On the side of the theory before interpretation, each operator symbol comes with its symbol type made of In a theory with only one type, this data is reduced to the arity.
The constant symbols (or constants) of a theory are its nullary operator symbols (having no argument).
Unary operators (that are functions) will be called here functors (*).

The list of types is completed by the Boolean type, interpreted as the set of two elements we shall denote 1 for «true» and 0 for «false». A variable of this type (outside the theory) is called a Boolean variable.

A para-operator is a generalized operator allowing the Boolean type among its types of arguments and results.
A (logical) connective is a para-operator with only Boolean arguments and values.
A predicate is a para-operator with Boolean values, and at least one argument but no Boolean argument.
As will be formalized in 2.6., any n-ary operator f may be reduced to the (n+1)-ary predicate (y = f(x1,...,xn)), true for a unique value of y for any chosen values of x1,...,xn.

Structures of set theory

Formalizing set theory, means describing it as a theory with its notions, structures and axioms. We shall admit 3 primitive notions : elements (all objects), sets and functions. Their main primitive structures are introduced below. Most other primitive symbols and axioms will be presented in 1.7,1.8, 2.1 and 2.2, in a dedicated logical framework, convertible into first-order logic by a procedure also described in 2.1. Still more primitive components will be needed and added later (2.3, 2.6, 2.7, 4.3). Optional ones, such as the axiom of choice (2.12), will open a diversity of possible set theories.

This view of set theory as described by (one-)model theory, relates the terminologies of both theories in a different way than when interpreting generic theories in set theory. As the set theoretical notions (sets, functions...) need to keep their natural names when defined by this formalization, it would become incorrect to keep that terminology for their use in the sense of the previous link (where notions were "sets" and operators were "operations"). To avoid confusion, let us here only use the model theoretical notions as our conceptual framework, ignoring their set theoretical interpretations. We shall describe in 1.7 and 1.B how both links can be put together, and how both ways to conceive the same theories (describing them by model theory or using a set theoretical interpretation) can be reconciled.

One aspect of the role of sets is given by the binary predicate ∈ of belonging : for any element x and any set E, we say that x is in E (or x belongs to E, or x is an element of E, or E contains x) and write xE, to mean that x is a possible value of the variables with range E.
Functions f play their role by two operators: the domain functor Dom, and the function evaluator, binary operator that is implicit in the notation f(x), with arguments f and x, giving the value of any function f at any element x of Dom f.

About ZFC set theory

The Zermelo-Fraenkel set theory (ZF, or ZFC with the axiom of choice) is a generic theory with only one type «set», one structure symbol ∈ , and axioms. It implicitly assumes that every object is a set, and thus a set of sets and so on, built over the empty set.
As a rather simply expressible but very powerful set theory for an enlarged founding cycle, it can be a good choice indeed for specialists of mathematical logic to conveniently prove diverse difficult foundational theorems, such as the unprovability of some statements, while giving them a scope that is arguably among the best conceivable ones.
But despite the habit of authors of basic math courses to conceive their presentation of set theory as a popularized or implicit version of ZF(C), it is actually not an ideal reference for a start of mathematics for beginners:

Formalizing types and structures as objects of one-model theory

To formalize one-model theory through the use of the meta- prefix, both meta-notions of "types" and "structures" are given their roles by meta-structures as follows.

Since one-model theory assumes a fixed model, it only needs one meta-type of "types" to play both roles of abstracts types (in the theory) and interpreted types (components of the model), respectively given by two meta-functors: one from variables to types, and one from objects to types. Indeed the more general notion of «set of objects» is not used and can be ignored.

But the meta-notion of structure will have to remain distinct from the language, because more structures beyond those named in the language will be involved (1.5). Structures will get their roles as operations, from meta-structures similar to the function evaluator (see 3.1-3.2 for clues), while the language (set of structure symbols) will be interpreted there by a meta-functor from structure symbols to structures.
However, this mere formalization would leave undetermined the range of this notion of structure. Trying to conceive this range as that of «all operations between interpreted types» would leave unknown the source of knowledge of such a totality. This idea of totality will be formalized in set theory as the powerset (2.7), but its meaning will still depend on the universe where it is interpreted, far from our present concern for one-model theory.

1.5. Expressions and definable structures

Terms and formulas

Given the first two layers of a theory (a list of types and a language), an expression (that is either a term or a formula), is a finite system of occurrences of symbols, where an occurrence of a symbol in an expression is a place where that symbol is written (for example the term « x+x » has two occurrences of x and one of +). Each expression comes in the context of a given list of available free variables. In expressions of first-order theories and set theory, symbols of the following kinds may occur. Any expression will give (define) a value (either an object or a Boolean) for each possible data of In first-order logic, let us call logical symbols the quantifiers and symbols of para-operators outside the language (equality, connectives and conditional operator): their list and their meaning in each system are determined by the logical framework and the given types list, which is why they are not listed as components of individual theories.

Let us sketch a more precise description (the case of expressions with only free variables and operator symbols, called algebraic terms, will be formalized in set theory in 4.1 for only one type).

Each expression contains a special occurrence of a symbol called its root, while each other occurrence is the root of a unique sub-expression (another expression which we may call the sub-expression of that occurrence). The type of an expression, which will be the type of its values, is given by the type of result of its root. Expressions with Boolean type are formulas; others, whose type belongs to the given types list, are terms (their values will be objects).
Expressions are built successively, in parallel between different lists of available free variables. The first and simplest ones are made of just one symbol (as root, having a value by itself) : constants and variables are the first terms; the Boolean constants 1 and 0 are the simplest formulas.
The next expressions are then successively built as made of the following data:

Display conventions

The display of this list of sub-expressions directly attached to the root requires a choice of convention. For a para-operator symbol other than constants : Parenthesis can also be used to distinguish (separate) the subexpressions, thus distinguish the root of each expression from other occurring symbols. For example the root of (x+y)n is the exponentiation operator.

Variable structures

Usually, only few objects are named by the constants in a given language. Any other objects can be named by a fixed variable, whose status depends on the choice of theory to see it: The difference vanishes in generic interpretations which turn constant symbols into variables (whose values define different models).
By similarity to constants which are particular structures (nullary operators), the concept of variable can be generalized to that of variable structure. But those beyond nullary operations (ordinary variables) escape the above list of allowed symbols in expressions. Still some specific kinds of use of variable structure symbols can be justified as abbreviations (indirect descriptions) of the use of legitimate expressions. The main case of this is explained below, forming a development of the theory ; more possible uses will be introduced in 1.10 (the view of a use as an abbreviation of other works amounts to using a developed version of the logical framework).

Structures defined by expressions

In any theory, one can legitimately introduce a symbol meant either as a variable structure (operator or predicate) or a Boolean variable (nullary predicate, not a "structure"), as abbreviation of, thus defined by, the following data : The variability of this structure is the abbreviation of the variability of all its parameters.

In set theory, any function f is synonymous with the functor defined by the term f(x) with argument x and parameter f (but the domain of this functor is Dom f instead of a type).
The terms without argument define simple objects (nullary operators) ; the one made of a variable of a given type, seen as parameter, suffices to give all objects of its type.

Now let us declare (the range of) the meta-notion of "structure" in one-model theory, and thus those of "operator" and "predicate", as having to include at least all those reachable in this way: defined by any expression with any possible values of parameters. The minimal version of such a meta-notion can be formalized as a role given to the data of an expression with values of its parameters. As this involves the infinite set of all expressions, it is usually inaccessible by the described theory itself : no single expression can suffice. Still when interpreting this in set theory, more operations between interpreted types (undefinable ones) usually exist in the universe. Among the few exceptions, the full set theoretical range of a variable structure with all arguments ranging over finite sets (as interpreted types) with given size limits, can be reached by one expression whose size depends on these limits.

Invariant structures

An invariant structure of a given system (interpreted language), is a structure defined from its language without parameters (thus a constant one). This distinction of invariant structures from other structures, generalizes the distinction between constants and variables, both to cases of nonzero arity, and to what can be defined by expressions instead of directly named in the language.

Indeed any structure named by a symbol in the language is directly defined by it without parameter, and thus invariant. As will be further discussed in 4.8, theories can be developed by definitions, which consists in naming another invariant structure by a new symbol added to the language. Among aspects of the preserved meaning of the theory, are the meta-notions of structure, invariant structure, and the range of theorems expressible with the previous language.

1.6. Logical connectives

We defined earlier the concept of logical connective. Let us now list the main useful ones, beyond both nullary ones (Boolean constants) 1 and 0. (To this will be added the conditional connective in 2.1). Their properties will be expressed by tautologies, which are formulas only involving connectives and Boolean variables (here written A, B, C), and true for all possible combinations of values of these variables. So, they also give necessarily true formulas when replacing these variables by any defining formulas using any language and interpreted in any systems. Such definitions of Boolean variables by formulas of a theory may restrict their ranges of possible values depending on each other.

Tautologies form the rules of Boolean algebra, an algebraic theory describing operations on the Boolean type, naturally interpreted as the pair of elements 0 and 1 but also admitting more sophisticated interpretations beyond the scope of this chapter.

The binary connective of equality between Booleans is written ⇔ and called equivalence: AB is read «A is equivalent to B».

Negation

The only useful unary connective is the negation ¬, that exchanges Booleans (¬A is read «not A»):
¬1
¬0
¬(¬A)
⇔ 0
⇔ 1
A
It is often denoted by barring the root of its argument, forming with it another symbol with the same format:
xy
xE
(AB)
⇔ ¬(x = y)
⇔ ¬(xE)
⇔ (A ⇔ ¬B)
(x is not equal to y)
(x is not an element of E)
(inequivalence)

Conjunctions, disjunctions

The conjunction ∧ means «and», being true only when both arguments are true;
The disjunction ∨ means «or», being true except when both arguments are false.
Each of them is :
Idempotent
(AA) ⇔ A
(AA) ⇔ A
Commutative
(BA) ⇔ (AB)
(BA) ⇔ (AB)
Associative
((AB)∧C) ⇔ (A∧(BC))
((AB)∨C) ⇔ (A∨(BC))
Distributive over the other
(A ∧ (BC)) ⇔ ((AB) ∨ (AC))
(A ∨ (BC)) ⇔ ((AB) ∧ (AC))

This similarity (symmetry) of their properties comes from the fact they are exchanged by negation:

(AB) ⇎ (¬A ∧ ¬B)
(AB) ⇎ (¬A ∨ ¬B)

The inequivalence is also called exclusive or because (AB) ⇔ ((AB) ∧ ¬(AB)).

Chains of conjunctions such as (ABC), abbreviate any formula with more parenthesis such as ((AB) ∧ C), all equivalent by associativity ; similarly for chains of disjunctions such as (ABC).

Asserting (declaring as true) a conjunction of formulas amounts to successively asserting all these formulas.

Implication

The binary connective of implication ⇒ is defined as (AB) ⇔ ((¬A) ∨ B). It can be read «A implies B», «A is a sufficient condition for B», or «B is a necessary condition for A». Being true except when A is true and B is false, it expresses the truth of B when A is true, but no more gives information on B when A is false (as it is then true).
Moreover,
(AB) ⇎
(AB) ⇔
(A ∧ ¬B)
B ⇒ ¬A)
The formula ¬B ⇒ ¬A is called the contrapositive of AB.
The equivalence can also be redefined as
(AB) ⇔ ((AB) ∧ (BA)).
Thus in a given theory, a proof of AB can be formed of a proof of the first implication (AB), then a proof of the second one (BA), called the converse of (AB).

The formula A ∧ (AB) is equivalent to AB but will be written AB, which reads «A therefore B», to indicate that it is deduced from the truths of A and AB.

Negations turn the associativity and distributivity of ∧ and ∨, into various tautologies involving implications:

(A ⇒ (BC)) ⇔ ((AB) ⇒ C)
(A ⇒ (BC)) ⇔ ((AB) ∨ C)

(A ⇒ (BC)) ⇔ ((AB) ∧ (AC))
((AB) ⇒ C) ⇔ ((AC) ∧ (BC))
((AB) ⇒ C) ⇔ ((AC) ∧ (BC))
(A ∧ (BC)) ⇔ ((AB) ⇒ (AC))
Finally,
(AB) ⇒ ((AC) ⇒ (BC))
(AB) ⇒ ((AC) ⇒ (BC)).

Chains of implications and equivalences

In a different kind of abbreviation, any chain of formulas linked by ⇔ and/or ⇒ will mean the chain of conjunctions of these implications or equivalences between adjacent formulas:

(ABC) ⇔ ((AB) ∧ (BC)) ⇒ (AC)
(ABC) ⇔ ((AB) ∧ (BC)) ⇒ (AC)
0 ⇒ AA ⇒ 1
A) ⇔ (A ⇒ 0) ⇔ (A ⇔ 0)
(A ∧ 1) ⇔ A ⇔ (A ∨ 0) ⇔ (1 ⇒ A) ⇔ (A ⇔ 1)
(AB) ⇒ A ⇒ (AB)

1.7. Classes in set theory

For any theory, a class is a unary predicate A seen as the set of objects where A is true, that is «the class of all x such that A(x)».
In particular for set theory, each set E is synonymous with the class of the x such that xE (defined by the formula xE with argument x and parameter E). However, this involves two different interpretations of the notion of set, that need to be distinguished as follows.

Standard universes and meta-sets

Interpreting (inserting) set theory into itself, involves articulating two interpretations (models) of set theory, which will be distinguished by giving the meta- prefix to the one used as framework. Aside generic interpretations, set theory has a standard kind of interpretation into itself, where each set is interpreted by the class (meta-set) of its elements (the synonymous object and meta-object are now equal), and each function is interpreted by its synonymous meta-function. This way, any set will be a class, while any class is a meta-set of objects. But some meta-sets of objects are not classes (no formula with parameters can define them); and some classes are not sets, such as the class of all sets (see Russell's paradox in 1.8), and the universe (class of all objects, defined by 1).

Definiteness classes

Set theory accepts all objects as «elements» that can belong to sets and be operated by functions (to avoid endless further divisions between sets of elements, sets of sets, sets of functions, mixed sets...). This might be formalized keeping 3 types (elements, sets and functions), where each set would have a copy as element, identified by a functor from sets to elements, and the same for functions. But beyond these types, our set theory will anyway need more notions as domains of its structures, which can only be conveniently formalized as classes. So, the notions of set and function will also be classes named by predicate symbols:

Set = «is a set»
Fnc = «is a function»

In first-order logic, any expression is ensured to take a definite value, for every data of a model and values of all free variables there (by virtue of its syntactic correction, that is implicit in the concept of «expression»). But in set theory, this may still depend on the values of free variables.
So, an expression A (and any structure defined from it) will be called definite, if it actually takes a value for the given values of its free variables (seen as arguments and parameters of any structure it defines). This condition is itself an everywhere definite predicate, expressed by a formula dA with the same free variables. Choosing one of these as argument, the class it defines is the meta-domain, called class of definiteness, of the unary structure defined by A.
Expressions should be only used where they are definite, which will be done rather naturally. The definiteness condition of (xE) is Set(E). That of the function evaluator f(x) is Fnc(f) ∧ x ∈ Dom f.
But the definiteness of the last formula needs a justification, given below.

Extended definiteness

A theory with partially definite structures, like set theory, can be formalized (translated) as a theory with one type and everywhere definite structures, keeping intact all expressions and their values wherever they are definite : models are translated one way by giving arbitrary values to indefinite structures (e.g. a constant value), and in the way back by ignoring those values. Thus, an expression with an indefinite subexpression may be declared definite if its final value does not depend on these extra values.
In particular for any formulas A and B, we shall regard the formulas AB and AB as definite if A is false, with respective values 0 and 1, even if B is not definite. So, let us give them the same definiteness condition dA ∧ (A ⇒ dB) (breaking, for A ∧ B, the symmetry between A and B, that needs not be restored). This formula is made definite by the same rule provided that dA and dB were definite. This way, both formulas

A ∧ (BC)
(AB) ∧ C

have the same definiteness condition (dA ∧ (A ⇒ (dB ∧ (BdC)))).

Classes will be defined by everywhere definite predicates, easily expressible by the above rule as follows.
Any predicate A can be extended beyond its domain of definiteness, in the form dAA (giving 0), or dAA (giving 1).
For any class A and any unary predicate B definite in all A, the class defined by AB is called the subclass of A defined by B.

1.8. Binders in set theory

The syntax of binders

This last kind of symbol can form an expression by taking a variable symbol, say here x, and an expression F which may use x as a free variable (in addition to the free variables that are available outside), to give a value depending on the unary structure defined by F with argument x. Thus, it separates the «inside» subexpression F having x among its free variables, from the «outside» where x is bound. But in most cases (in most theories...), binders cannot keep the full information on this unary structure, which is too complex to be recorded as an object as we shall see below.

We shall first review both main binders of set theory : the set-builder and the function definer. Then 1.10 will present both main quantifiers. Finally 2.1 and 2.2 will give axioms to complete this formalization of the notions of sets and functions in set theory.

The syntax differs between first-order logic and set theory, which manage the ranges of variables differently. In first-order logic, ranges are types, implicit data of quantifiers. But the ranges of binders of set theory are sets which, as objects, are designated by an additional argument of the binder (a space for a term not using the variable being bound).

Set-builder

For any unary predicate A definite on all elements of a set E, the subclass of E defined by A is a set : it is the range of a variable x introduced as ranging over E, so that it can be bound, from which we select the values satisfying A(x). It is thus a subset of E, written {xE | A(x)} (set of all x in E such that A(x)): for all y,

y ∈ {xE | A(x)} ⇔ (yEA(y))

This combination of characters { ∈ | } forms the notation of a binder named the set-builder: {xE | A(x)} binds x with range E on the formula A.

Russell's paradox

If the universe (class of all elements) was a set then, using it, the set builder could turn all other classes, such as the class of all sets, into sets. But this is impossible as can be proven using the set-builder itself :

Theorem. For any set E there is a set F such that FE. So, no set E can contains all sets.

Proof. F={xE | Set(x) ∧ xx} ⇒ (FF ⇔ (FEFF)) ⇒ (FFFE). ∎

This will oblige us to keep the distinctions between sets and classes.

The function definer

The function definer ( ∋ ↦ ) binds a variable on a term, following the syntax Ext(x), where Being definite if t(x) is definite for all x in E, it takes then the functor t and restricts its domain (definiteness class) to the set E, to give a function with domain E. So it converts functors into functions, reversing the action of the function evaluator (with the Dom functor) that converted (interpreted) functions into their role (meaning) as functors whose definiteness classes were sets.
The shorter notation xt(x) may be used when E is determined by the context, or in a meta description to designate a functor by specifying the argument x of its defining term.

Relations

A relation is a role playing object of set theory similar to an operation but with Boolean values : it acts as a predicate whose definiteness classes (ranges of arguments) are sets (so, predicates could be described as relations between interpreted types).

Now unary relations (functions with Boolean values), will be represented as subsets S of their domain E, using the set-builder in the role of definer, and ∈ in the role of evaluator interpreting S as the predicate x ↦ (xS). This role of S still differs from the intended unary relation, as it ignores its domain E but is definite in the whole universe, giving 0 outside E. This lack of operator Dom does not matter, as E is usually known from the context (as an available variable).

As the function definer (resp. the set-builder) records the whole structure defined by the given expression on the given set, it suffices to define any other binder of set theory on the same expression with the same domain, as made of a unary structure applied to its result (that is a function, resp. a set).

1.9. Axioms and proofs

Statements

An expression is ground if its list of available free variables is empty (all its variables are bound), so that its value only depends on the system where it is interpreted.
In first-order logic, a statement is a ground formula. Thus, it will have a definite Boolean value (true or false) only depending on the choice of a system that interprets its language.
The axioms list of a theory is a set of statements, meant to be all true in some given system(s) (models of the theory as discussed below).

Provability

A proof of a statement A in a theory T, is a finite model of proof theory, involving A and a finite list of axioms among those of T.
Specialists could establish a full formalization for the concept of proof (the full details of a proof theory), which can take the form of a proof verification algorithm (only requiring an amount of computing resources related to the size of a given proof). But most mathematical works are only partially formalized : semi-formal proofs are used with just enough rigor to give the feeling that a full formalization is possible, yet not fully written; an intuitive vision of a problem may seem clearer than a formal argument. Likewise, this work will present proofs naturally without explicit full formalization : sometimes using natural language articulations, proofs will mainly be written as successions of statements each visibly true thanks to previous ones, premises, axioms, known theorems and rules of proof, especially those of quantifiers (1.10). Let us review some theoretical properties.

We say that A is provable in T, or a theorem of T, and write TA if a proof of A in T exists. This may be conceived in terms of a proof searching algorithm processed by a theoretical computer with unlimited (infinite) available time and resources, to try every possible piece of proof regardless its size until it eventually finds a conclusive proof of the given statement.
In practice, we only qualify as theorems the statements known as such, i.e. for which a proof is known. But synonyms for "theorem" are traditionally used according to their importance: a theorem is more important than a proposition; either of them may be deduced from an otherwise less important lemma, and easily implies an also less important corollary.

A statement A is called logically valid, which is written ⊢ A, if it is provable without using any axiom (thus a theorem of any theory regardless of axioms). Then A is true in every system by virtue of the logical framework. The simplest ones are the tautologies; others will be given in 1.10.
A proof of A from some axioms can also be seen as a proof of (conjunction of these axioms ⇒ A) without axiom, thus showing the logical validity of this implication.

Again in T, a refutation of A is a proof of ¬A. If one exists (T ⊢ ¬A), the statement A is called refutable (in T).
A statement is called decidable (in T) if it is either provable or refutable.

A theory T is called contradictory or inconsistent if T ⊢ 0, otherwise it is called consistent. If a statement is both provable and refutable in T then all are, because it means that T is inconsistent, independently of the chosen statement:

⊢ (A ∧ ¬A) ⇔ 0
((TA) ∧ (TB)) ⇔ (TAB)
((TA) ∧ (T ⊢ ¬A)) ⇔ (T ⊢ 0).

In an inconsistent theory, every statement is provable. Such a theory has no model.

Realistic vs. axiomatic theories in mathematics and other sciences

Apart from the distinction of nature (mathematical vs. non-mathematical), theories may also differ by the intention of their use, between realism and formalism.

A realistic theory is a theory involved to describe either a fixed system or the systems from a range, seen as given from some independent reality. For this, a list of axioms is chosen as a set of statements which, for some reason, are considered known as true on the intended system(s). Such a theory is said to be "true" if its axioms are indeed true there.

The truth of realistic theories is usually well ensured in pure mathematics, but those of other fields may be questionable: non-mathematical statements over non-mathematical systems may be ambiguous (ill-defined), while the truth of theories of applied mathematics may be approximative, or speculative as the intended system(s) may be unknown (contingent among alternative possibilities, that are equally possible thus mathematically "existing"). A theory beyond pure mathematics is called falsifiable if, in principle, the case of its falsity can be discovered by finding a mismatch between its predictions (theorems) and observations. For example, biology is relative to a huge number of random choices silently accumulated by Nature on Earth during billions of years ; it has lots of "axioms" which are falsifiable and require a lot of empirical testing.

An axiomatic theory is a theory given formally with an axioms list that means to define the range of models, as the class of all systems, interpreting the given language, where all axioms are true (rejecting those where some axiom is false). This makes automatic the truth of axioms in any model. Outside pure mathematics, non-realistic theories (not well called axiomatic, as the use of axioms may lose clarity) would be works of fiction describing imaginary or possible future systems.

For example Euclidean geometry first came as a realistic theory of applied mathematics (for its role of first theory of physics), but is now understood as an axiomatic theory of pure mathematics in a landscape of diverse equally legitimate geometries, while the real physical space is more accurately described by the non-Euclidean geometry of General Relativity.

Completeness of first-order logic

The problem of formalizing the concept of «proof» actually has a perfect solution for first-order logic, in the sense of leading to the perfect concept of provability, equivalent to truth in all models. Indeed, as proofs must be logical deductions, every theorem will be true in each model (where all axioms are true), but the converse also interestingly holds, according to the completeness theorem, which was originally Gödel's thesis : if a statement is true in all models of a given first-order axiomatic theory, then it is provable in it. In other words, any formally consistent axiomatic theory (set of first-order statements) does match some existing system; and any undecidable formula will be true in some models and false in others. Thus different consistent theories may «disagree» without conflict, by being all true descriptions of different systems, all mathematically existing without any issue of «where they are».
Such a perfect formalization is essentially unique in the sense that any for any two of them, any "proof" for the one can be translated into a "proof" for the other.

This quality of first-order logic, with its ability to express all mathematics (any logical framework beyond it can anyway be developed from set theory, itself expressible in first-order logic), gives it a central importance in the foundations of mathematics, and dissolves much of a priori disagreements between realism (Platonism) and formalism : any consistent axiomatic theory describes some mathematical reality (model) that exists, but is generally not unique, as an unlimited diversity of models may remain. So, for a realistic theory to not be an axiomatic theory, requires to conceive its intended (range of) system(s) otherwise than by the criteria of mathematical existence and satisfaction of given first-order statements.

The proof of the completeness theorem (given in 4.6) will consist in building a model of any consistent first-order axiomatic theory, as follows. The infinite set of all ground terms with operation symbols derived from the theory (those of its language plus others coming from its existence axioms), is turned into a model by progressively defining each predicate symbol over each combination of values of its arguments there, while keeping consistency. As this construction just involves the data of a theory and the set ℕ of natural numbers, the validity of this theorem only depends on the axiom of infinity, that is the existence of ℕ as an actual infinity.
The very definition of provability already involves infinity, as it means the existence of a proof without any limit of size. The size (complexity) of a proof, reflects the number of elements that need to be described in an attempted construction of a counter-example to a given theorem, to observe the necessary failure of this construction (while the choice of formalization only affects the size of proofs in more moderate proportions); this is not a priori predictable from, say, the size of the statement, for reasons which will be developed later.

Provability (with a fixed language) can also be expressed as a formula of first-order arithmetic (theory with the only type "natural number", constants 0,1, and operation of addition and multiplication), stating the existence of a number which encodes a proof, related to the free variable which encodes a statement (candidate theorem), depending on a predicate that encodes the axioms list. This is independent of the formalism of proof encoding, in the sense that all formulas expressing valid solutions to encode "provability" are provably equivalent to each other, under the ordinary axioms of first-order arithmetic which will be sketched in 4.3 and 4.4.
However, these axioms do not suffice to make these formulas decidable for every value of the free variable. To give full definiteness (Boolean values) to these formulas and any other formulas of arithmetic, requires to interpret arithmetic not axiomatically but realistically, in its ideal model called the standard model of arithmetic, intuitively described as the true set ℕ of exactly all really finite natural numbers, as will be clarified in Part 4.

1.10. Quantifiers

A quantifier is a binder taking a unary predicate (formula) and giving a Boolean value.
In set theory, the full syntax for a quantifier Q binding a variable x with range E on a unary predicate A, is

QxE, A(x)

where A(x) abbreviates the formula defining A, whose free variables are x and possible parameters.
A shorter notation puts the range as an index (QEx, A(x)), or deletes it altogether (Qx, A(x)) when it may be kept implicit (unimportant, or fixed by the context, such as a type in a generic theory).
The two main quantifiers (from which others will be defined later) are: The universal quantifier of set theory can be seen as defined from the set builder:

(∀xE, A(x)) ⇔ {xE | A(x)} = E.

The one of first-order logic can be defined in set theoretical interpretations, seeing A as a function and its Boolean values as objects:

(∀x, A(x)) ⇔ A = (x ↦ 1)

Anyway (∀x, 1) is always true.
∃ can be defined from ∀ with the same range : (∃x, A(x)) ⇎ (∀x, ¬A(x)).
Thus (∃x, A(x)) ⇔ A ≠ (x ↦ 0).
With classes,
(∃C x, A(x)) ⇔ (∃x, C(x) ∧ A(x)) ⇔ ∃CA x, 1
(∀C x, A(x)) ⇔ (∀x, C(x) ⇒ A(x))
x, C(x) ⇔ ∃C y, x=y

Inclusion between classes

A class A is said to be included in a class B when ∀x, A(x) ⇒ B(x). Then A is a subclass of B, as ∀x, A(x) ⇔ (B(x) ∧ A(x)). Conversely, any subclass of B is included in B.
The inclusion of A in B implies for any predicate C (in cases of definiteness):

(∀B x, C(x)) ⇒ (∀A x, C(x))
(∃A x, C(x)) ⇒ (∃B x, C(x))
(∃C x, A(x)) ⇒ (∃C x, B(x))
(∀C x, A(x)) ⇒ (∀C x, B(x))

Rules of proofs for quantifiers on a unary predicate

Existential Introduction. If we have terms t, t′,… and a proof of (A(t) ∨ A(t′) ∨ …), then x, A(x).

Existential Elimination. If x, A(x), then we can introduce a new free variable z with the hypothesis A(z) (the consequences will be true without restricting the generality).

These rules express the meaning of ∃ : going from some term to ∃ then from ∃ to z, amounts to letting z represent that term (without knowing which, but they can be gathered into one by the conditional operator). They give the same meaning to ∃C as to its 2 above equivalent formulas, bypassing (making implicit) the extended definiteness rule for (CA) by focusing on the case when C(x) is true and thus A(x) is definite.

While ∃ appeared as the designation of an object, ∀ appears as a deduction rule: ∀C x, A(x) means that its negation ∃C x, ¬A(x) leads to a contradiction.

Universal Introduction. If from the mere hypothesis C(x) on a new free variable x we could deduce A(x), then C x, A(x).

Universal Elimination. If C x, A(x) and t is a term satisfying C(t), then A(t).

Introducing then eliminating ∀ is like replacing x by t in the initial proof.

These rules can be used to justify the following logically valid formulas

((∀C x, A(x)) ∧ (∀C x, A(x) ⇒ B(x))) ⇒ (∀C x, B(x))
((∃C x, A(x)) ∧ (∀C x, A(x) ⇒ B(x))) ⇒ (∃C x, B(x))
((∀C x, A(x)) ∧ (∃C x, B(x))) ⇒ (∃C x, A(x) ∧ B(x))
(∀C x, A(x)∨B(x)) ⇒ ((∀C x, A(x)) ∨ (∃C x, B(x)))
(∃A x, ∀B y, R(x,y)) ⇒ (∀B y, ∃A x, R(x,y))
x, (∀y, R(x,y)) ⇒ R(x,x) ⇒ (∃y, R(x,y))

The formula (∃C x, A(x)) ∧ (∀C x, A(x) ⇒ B(x)) will be abbreviated as (∃C x, A(x) ∴ B(x)) while it implies but is not generally equivalent to (∃C x, A(x) ∧ B(x)).

Second-order quantification

Let us call second-order quantifier, a quantifier binding a variable structure symbol over the range of all structures of its symbol type, may this be conceived as the range of all definable ones (with all possible defining expressions whose free variables may have any list of parameters beyond the given arguments) or as the full set theoretical range, that is the range of all such structures which exist in the universe, relating the given interpreted types. The use of such a quantifier (and thus of variable structure symbols) is not allowed in first-order logic, but belongs to some other logical frameworks instead, such as second-order logic (part 5).
Still while keeping first-order logic as the main framework of a given theory, some second-order quantifiers may be used to describe some of its meta level aspects in the following ways (which will be involved in the formalization of set theory in 2.1 and 2.2). Let T be a first-order theory, T' its extension by a structure symbol s (without further axiom) and F a ground formula of T' (in first-order logic) also denoted F(s) when seen as a formula of T using the variable structure symbol s in second-order logic.

Second-order Universal Introduction. If T'F then T entails the second-order statement (∀s, F(s)).

This holds for any model and the full set theoretical range of s, independently of the universe in which models and structures s are searched for.

Second-order Universal Elimination. Once a second-order statement (∀s, F(s)) is accepted in a theory T, it is manifested in first-order logic as a schema of statements, that is an infinite list of statements of the form (∀parameters, F(s)) for each possible replacement of s by a defining expression with parameters.

Applying second-order universal elimination after second-order universal introduction, means deducing from T a schema of theorems, each one indeed deducible in first-order logic by the proof obtained from the original proof by replacing s by its definition.

In second-order logic, a new binder B can be defined by an expression here abbreviated as F(A) using a symbol A of variable unary structure whose argument will be bound by B:

A, (Bx, A(x)) = F(A)

By second-order universal elimination, this comes down to a schema of definitions in first-order logic : for each expression defining A, it defines (Bx, A(x)) like a structure symbol, by the expression F(A) whose available free variables are the parameters of F plus those of A.

Axioms of equality

In first-order logic with given types and a given language, some ground formulas involving = are logically valid for the range of interpretations keeping = as the = predicate of set theory, but no more for the larger range of interpretations letting it become any other predicate. A possible list of axioms of equality, is a list of some of such formulas which suffice to imply all others in this context. One such list consists in the following 2 axioms per type, where the latter is meant as an axiom schema by second-order universal elimination of the variable unary predicate A:
  1. x, x = x (reflexivity)
  2. A,∀x,∀y, (x = yA(y)) ⇒ A(x).
Variables x and y can also serve among parameters in definitions of A. This can be understood by re-ordering quantifiers as (∀x, ∀y, ∀A), or as deduced from cases only using other free variables a, b, by adapting an above logically valid formula as ∀a, ∀b, (∀x, ∀y, R(a, b, x,y)) ⇒ R(a, b, a,b).
Diverse definitions of A give diverse formulas (assuming reflexivity):
Formula
3. ∀x,∀y, x = yy = x
4. ∀x, ∀y, ∀z, (x = yy = z) ⇒ x = z
5. ∀f, ∀x, ∀y, x = yf(x) = f(y)
6. ∀A, ∀x, ∀y, x = y ⇒ (A(x) ⇔ A(y))
7. ∀x, ∀y, ∀z, (x = yz = y) ⇒ z = x
Name
Symmetry
Transitivity
A(u) used
y = u
u = z
f(u) = f(y)
¬A(u)
z = u
We shall abbreviate (x = yy = z) as x = y = z.
5. is an axiom schema with f ranging among functors between any two types.
6. can also be deduced from symmetry.
Remark. (1.∧7.) ⇒ 3., then 3. ⇒ (4. ⇔ 7) so that (1.∧7.) ⇔ (1.∧3.∧4.).

Another possible list of axioms of equality consists in formulas 1. to 5. where f and A range over the mere symbols of the language, each taken once per argument : the full scheme of 2. is implied by successive deductions for each occurrence of symbol in A. This will be further justified in 2.11 (equivalence relations).

Introducing a variable x defined by a term t by writing (x = t ⇒ ...), in other words putting the axiom x = t, can be seen as justified by the above rules in this way : t = t ∴ ∃x, (x = t ∴ ...).



Philosophical aspects of the foundations of mathematics

To complete our initiation to the foundations of mathematics, the following pages (from this one to Concepts of truth), will present an overview of some deeper, more philosophical and intuitive aspects of these foundations (much of which may be already implicitly understood but not well explained by specialists, being not easily seen as proper objects of scientific works). This includes
These things are not necessary for Part 2 (Set Theory, continued) except to explain the deep meaning and consequences of the fact that the set exponentiation or power set (2.6) is not justifiable by the set generation principle. But they will be developed and justified in more details in parts 4 and 5.

1.B. Time in model theory

The time order between interpretations of expressions

Given a model, expressions do not receive their interpretations all at once, but only the ones after the others, because these interpretations depend on each other, thus must be calculated after each other. This time order of interpretation between expressions, follows the hierarchical order from sub-expressions to expressions containing them.
Take for example, the formula xy+x=3. In order for it to make sense, the variables x and y must take a value first. Then, xy takes a value, obtained by multiplying the values of x and y. Then, xy+x takes a value based on the previous ones. Then, the whole formula (xy+x=3) takes a Boolean value (true or false).
But this value depends on those of the free variables x and y. Finally, taking for example the ground formula ∀x, ∃y, xy+x=3, its Boolean value (which is false in the world of real numbers), «is calculated from» those taken by the previous formula for all possible values of x and y, and therefore comes after them.
A finite list of formulas in a theory may be interpreted by a single big formula containing them all. This only requires to successively integrate (or describe) all individual formulas from the list in the big one, with no need to represent formulas as objects (values of a variable). This big formula comes (is interpreted) after them all, but still belongs to the same theory. But for only one formula to describe the interpretation of an infinity of formulas (such as all possible formulas, handled as values of a variable), would require to switch to the framework of one-model theory.

The metaphor of the usual time

I can speak of «what I told about at that time»: it has a sense if that past saying had one, as I got that meaning and I remember it. But mentioning «what I mean», would not itself inform on what it is, as it might be anything, and becomes absurd in a phrase that modifies or contradicts this meaning («the opposite of what I'm saying»). Mentioning «what I will mention tomorrow», even if I knew what I will say, would not suffice to already provide its meaning either: in case I will mention «what I told about yesterday» (thus now) it would make a vicious circle; but even if the form of my future saying ensured that its meaning will exist tomorrow, this would still not provide it today. I might try to speculate on it, but the actual meaning of future statements will only arise once actually expressed in context. By lack of interest to describe phrases without their meaning, we should rather restrict our study to past expressions, while just "living" the present ones and ignoring future ones.
So, my current universe of the past that I can describe today, includes the one of yesterday, but also my yesterday's comments about it and their meaning. I can thus describe today things outside the universe I could describe yesterday. Meanwhile I neither learned to speak Martian nor acquired a new transcendental intelligence, but the same language applies to a broader universe with new objects. As these new objects are of the same kinds as the old ones, my universe of today may look similar to that of yesterday; but from one universe to another, the same expressions can take different meanings.

Like historians, mathematical theories can only «at every given time» describe a universe of past mathematical objects, while this interpretation itself «happens» in a mathematical present outside this universe.
Even if describing «the universe of all mathematical objects» (model of set theory), means describing everything, this «everything» that is described, is only at any time the current universe, the one of our past ; our act of interpreting expressions there, forms our present beyond this past. And then, describing our previous act of description, means adding to this previous description (this «everything» described) something else beyond it.

The infinite time between models

As a one-model theory T' describes a theory T with a model M, the components (notions and structures) of the model [T,M] of T', actually fall into 3 categories: This last part of [T,M] is a mathematical construction determined by the combination of both systems T and M but it is not directly contained in them : it is built after them.
So, the model [T,M] of T', encompassing the present theory T with the interpretation of all its formulas in the present universe M of past objects, is the next universe of the past, which will come as the infinity of all current interpretations (in M) of formulas of T will become past.

Or can it be otherwise ? Would it be possible for a theory T to express or simulate the notion of its own formulas and compute their values ?

Truth undefinability

As we shall see, some theories such as model theory, and set theory from which it can be developed, are actually able to describe themselves: they can describe in each model a system looking like a copy of the same theory, with a notion of "all its formulas" (including objects that are copies of its own formulas). However then, a Truth Undefinability theorem will be established, showing that no single formula (invariant predicate) can ever give the correct boolean values to all object copies of ground formulas, in conformity with the values of these formulas in the same model.

A strong and rigorous proof will be given later. Here is an easy one.

The Berry paradox

This famous paradox is the idea of "defining" a natural number n as "the smallest number not definable in less than twenty words". This would define in 10 words a number... not definable in less than 20 words. But this does not bring a contradiction in mathematics because it is not a mathematical definition. By making it more precise, we can form a simple proof of the truth undefinability theorem (but not a fully rigorous one):
Let us assume a fixed choice of a theory T describing a set ℕ of natural numbers as part of its model M.
Let H be the set of formulas of T with one free variable intended to range over this ℕ, and shorter than (for example) 1000 occurrences of symbols (taken from the finite list of symbols of T, logical symbols and variables).
Consider the formula of T' with one free variable n ranging over ℕ, expressed as
FH, F(n) ⇒ (∃k<n, F(k))
This formula cannot be false on more than one number per formula in H, which are only finitely many (an explicit bound of their number can be found). Thus it must be true on some numbers.
If it was equivalent to some formula BH, we would get

n∈ℕ, B(n) ⇔ (∀FH, F(n) ⇒ (∃k<n, F(k))) ⇒ (∃k<n, B(k))

contradicting the existence of a smallest n on which B is true.
The number 1000 was picked in case translating this formula into T was complicated, ending up in a big formula B, but still in H. If it was so complicated that 1000 symbols didn't suffice, we could try this reasoning starting from a higher number. Since the existence of an equivalent formula in H would anyway lead to a contradiction, no number we might pick can ever suffice to find one. This shows the impossibility to translate such formulas of T' into equivalent formulas of T, by any method much more efficient than the kind of mere enumeration suggested above.
This infinite time between theories, will develop as an endless hierarchy of infinities.

On the incompleteness theorem

(to be completed)
....Provability is expressible as the existence of a number which encodes a proof, made of one existential quantifier that is unbounded in the sense of arithmetic (∃ p, ) where p is an encoding of the proof, and inside is a formula where all quantifiers are bounded, i.e. with finite range (∀x < (...), ...), expressing a verification of this proof.

The time of proving

If no proof of a statement could be found within given limited resources, it may still be a theorem whose proofs could not be found as they may be any longer. This is often unpredictable, for deep theoretical reasons which will appear from the study of the incompleteness theorem and related ones such as Gödel's speed-up theorem :

Set theory as a unified framework

Structure definers in diverse theories

Let us call structure definer any binder B which, used on diverse expressions A, faithfully records the unary structure defined by A on some range E (type or class defined by an argument here implicit), i.e. its result S = (Bx, A(x)) can restore this structure by an evaluator V (symbol or expression) : V(S, x) = A(x) for all x in E. Admitting the use of negation and the possibility to interpret Booleans by objects (in a range with at least 2 objects, which is often the case), Russell's paradox shows that adding both following requirements on a structure definer in a theory would lead to contradiction :
  1. All such S belong to E
  2. V can occur in the expression A and use x anyhow in its arguments, namely V(x, x) is allowed, which makes sense as 1. ensures the definiteness of any V(S, S).
Let us list the remaining options. Set theory rejects 1. but keeps 2. But since 1. is rejected, keeping 2. may be or not be an issue depending on further details.

As will be explained in 4.9, extending a generic theory (whose ranges of binders were the types) by a new type K given as the set of all structures defined by a fixed expression A for all combinations of values of its parameters, forms a legitimate development of the theory (a construction). Indeed a binder on a variable structure symbol S with such a range K abbreviates a successive use of binders on all the parameters of A which replaces S. Here A and the system interpreting it come first, then the range K of the resulting S and their interpretations by V are created outside them : A has no sub-term with type K, thus does not use V (which has an argument of type K).

The notion of structure in first-order logic (as a one-model theory) has this similarity with the notion of set in set theory : for each given symbol type beyond constants, the class of all structures of that type is usually not a set, calling "sets" such ranges K (of structures defined by a fixed expression with variable parameters), or subclasses of these.
The fully developed theory with the infinity of such new types constructed for all possible expressions A, can become similar to set theory by gathering to a single type U all constructed types K of variable structures of the same symbol type (structures over the same sets), interpreted by the same symbol V (which could be already used by A). This merely packs into V different structures without conflict since they come from different types K of the first argument. This remains innocent (re-writing what can be done without it) as long as in the new theory, the binders of type U stay restricted to one of these "sets" K (or covered by finitely many of them, which are actually included in another one).

In set theory, the ranges of binders are the sets. Thus, beyond the simplifying advantage of removing types, set theory will get more power when accepting more classes as sets.

Other theories, which we shall ignore in the rest of this work, follow more daring options:

The unified framework of theories

Attempts to formalize one-model theory in first-order logic cannot completely specify the meta-notions of «expressions» and «proofs». Indeed as will be explained in 4.7 (Non-standard models of Arithmetic), any first-order theory aiming to describe finite systems without size limit (such as expressions and proofs) inside its model (as classes included in a type), will still admit in some models some pseudo-finite ones, which are infinite systems it mistakes as «finite» though sees them larger than any size it can describe (as the latter is an infinity of properties which it cannot express as a whole to detect the contradiction ; these systems will also be called non-standard as «truly finite» will be the particular meaning of «standard» when qualifying kinds of systems which should normally be finite).
To fill this gap will require a second-order universal quantifier (1.10), whose meaning is best expressed (in appearance though not really completely) after insertion in set theory (whose concept of finiteness will be defined in 4.5). As this insertion turns its components into free variables whose values define its model [T, M], their variability removes its main difference with model theory (the other difference is that model theory can also describe theories without models). This view of model theory as developed from set theory, will be exposed in Parts 3 and 4, completing the grand tour of the foundations of mathematics after the formalization of set theory in a logical framework.

Given a theory T so described, let T0 be the external theory, also inserted in set theory, which looks like a copy of T as any component k of T0 has a copy as an object serving as a component of T. In some proper formalization, T0 can be defined from T as made of the k such that («k» ∈ T) is true, where the notation «k» abbreviates a term of set theory designating k as an object, and the truth of this formula means that the value of this term in the universe belongs to T.

This forms a convenient unified framework for describing theories interpreted in models, encompassing both previous ones (set-theoretical and model-theoretical): all works of the theory T0 (expressions, proofs and other developments), have copies as objects formally described by the model theoretical development of set theory as works of the theory T. In the same universe, any system M described as a model of T is indirectly also a (set-theoretical) model of T0.
This powerful framework is bound to the following limits : So understood, the conditions of use of this unified framework of theories, are usually accepted as legitimate assumptions, by focusing on well-described theories (though no well-described set theory can be the "ultimate" one as mentioned below), interpreted in standard universes whose existence is admitted on philosophical grounds; this will be further discussed in philosophical pages.

Set theory as a unified framework of itself

The above unified framework is applicable to the case of set theory itself, thus expanding the tools of interpretation of set theory into itself already mentioned in 1.7. Namely, in the above unified framework, the theory T0 describing M and idealized as an object T, will be set theory itself. Taking it as an identical copy of the set theory serving as framework, amounts to taking the same set theory interpreted by two universes.

A kind of theoretical difference between both uses of set theory will turn out to be irreducible (by the incompleteness theorem): for any given (invariant) formalization of set theory, the existence of a model of it (universe), or equivalently its consistency, formalized as a set theoretical statement with the meta interpretation, cannot be logically deduced (a theorem) from the same axioms. This statement, and thus also the stronger statement of the existence of a standard universe, thus forms an additional axiom of the set theory so used as framework.

Zeno's Paradox

Achilles runs after a turtle; whenever he crosses the distance to it, the turtle takes a new length ahead.
Seen from a height, a vehicle gone on a horizontal road approaches the horizon.
Particles are sent in accelerators closer and closer to the speed of light.
Can they reach their ends ?

Each example can be seen in two ways:

In each example, a physical measure of the «cost» to approach and eventually reach the targeted end, decides its «true» interpretation, according to whether this cost would be finite or infinite, which may differ from the first guess of a naive observer.
But the world of mathematics, free from all physical costs and where objects only play conventional roles, can accept both interpretations.

Each generic theory is «closed», as it can see its model (the ranges of its variables) as a whole (that is a set in its set theoretical formulation): by its use of binders over types (or classes), it «reaches the end» of its model, and thus sees it as «closed». But any possible framework for it (one-model theory and/or set theory) escapes this whole.
As explained in 1.7, set theory has multiple possible models : from the study of a given universe of sets, we can switch to that of a larger one with more sets (that we called meta-sets), and new functions between the new sets.

As this can be repeated endlessly, we need an «open» theory integrating each universe described by a theory, as a part (past) of a later universe, forming an endless sequence of growing realities, with no view of any definite totality. This role of an open theory will be played by set theory itself, with the way its expressions only bind variables on sets (1.8).



Set theory and foundations of mathematics
Next : 2. Set theory (continued)