Set Theory and Foundations of Mathematics
2. Set theory

2.1. First axioms of set theory

The inclusion predicate

The binary predicate ⊂ of inclusion between sets is defined by : for all sets E and F,

E ⊂ F ⇔ ∀x∈E, x ∈ F

and read as "E is included in F", or "E is a subset of F", or "F includes E". Properties of inclusion between classes apply.
E ⊂ E is logically valid. Implications chains also appear as inclusion chains:

(E ⊂ F ⊂ G) ⇔ (E ⊂ F ∧ F ⊂ G) ⇒ E ⊂ G.

We may use the same notation E ⊂ F as abbreviation for the inclusion of a set E in a class F (and similarly for other formulas defined from this):

E ⊂ F ⇔ ∀x∈E, F(x)

Formulas vs statements

Most set theories (except mainly FST and some strong versions) will only accept bounded formulas as sub-formulas of terms (by the set-builder, and later by the conditional operator) and as possible definitions of predicates (what predicate symbols abbreviate). Open quantifiers will be only allowed in statements (declarared true as axioms or theorems).
In set theory, a statement is a ground formula which can combine the symbols of first-order logic with the regular ones of set theory as follows: it must be made of a chain of open quantifiers, usually all ∀ and often written in words ("for all"), followed by a bounded formula. Proofs will naturally use the deduction rules for open quantifiers (introduction and elimination) by common language articulations.
The above definition of ⊂, basically claiming the predicate symbol ⊂ to abbreviate the bounded formula (∀x∈E, x ∈ F), can also be seen as an example of statement with open quantifiers (∀_SetE,F, ) taken as an axiom.

The role of axioms

As explained in 1.9, the role of the axioms of a generic theory, is to express its intended range of models as a selection (class) from a wider notion of "all systems" with structures named by the same language. The logical framework, which holds (describes) this wider notion, can interpret the axioms in each such system, then exclude those where it finds some axiom to be false. Excluded systems remain possible models of different theories described by the same framework. Any statement true in all systems (a logically valid statement) is useless (redundant) as an axiom.

But expressing set theory in its special logical framework not used for other theories, leaves a priori undefined the distinction between its logically valid statements, and its other basic accepted truths which need to be declared as axioms due to their falsity in some "non-universe". Now this distinction can be given sense by converting set theory into a generic theory: let us call axioms of a set theory a list of its basic "true" statements whose translated versions in first-order logic are not logically valid, but form a proper list of axioms for that generic theory to be equivalent to the intended version of set theory. From there, provability in a set theory can be defined as the one given by first-order logic with these axioms.

Converting the binders

When converting set theoretical expressions into first-order logic, the only modified symbols are the binders, as their format of use differs between both frameworks. Let us describe the rules of this conversion.
The function definer (1.8) becomes an infinity of operator symbols: for each term t with one argument and any list of parameters, the whole term (E ∋ x ↦ t(x)) is seen as the big name of a distinct operator symbol, whose arguments are E and the parameters of t. (Those where every subexpression of t without any occurrence of x is the only occurrence of a parameter, would suffice to define others).
The same goes for the set-builder, which will come as a particular case in 2.2.
The conversion of quantifiers comes by expressing their domains as classes :

(∃x∈E, A(x)) → (∃x, x ∈ E ∧ A(x))
(∀x∈E, A(x)) → (∀x, x ∈ E ⇒ A(x))

Classification of axioms

Axioms of set theories (sometimes with other primitive components) can be classified as follows according to their roles, ordered from the more "primitive" (necessary) components, to the more optional and debatable ones (opening a diversity of acceptable set theories).

The technical axioms give to the primitive notions and symbols their correct roles :
- The notions of sets and functions, symbolized in 1.7, are axiomatized here in 2.1 : axioms for notions ; axiom of extensionality ; axioms for functions.
- Unique element (2.4) ;
Axioms from the set generation principle (2.2) ;
Strengthening axioms, introduced in 1.A ;
More optional technical axioms will come later:
- Axiom of choice (2.10) might be seen as a further specification for the powerset ;
- Axiom of foundation (evoked in 2.A and expressed in 5.3).

Axioms for notions

The formalization of primitive notions by class symbols following 1.7, needs to be completed by the following axioms.

∀x, ¬(Set(x) ∧ Fnc(x))	: sets are not functions (though it does not matter)
∀_Fnc f, Set(Dom f)	: the domain of every function is a set
(Fc) ∀(t,E), Fnc(E ∋ x ↦ t(x))	: any definite (E ∋ x ↦ t(x)) is a function

Here and in the below axioms for functions, ∀(t,E) is meant as declaring an axiom schema by second-order universal elimination over the variable functor t with E included in the definiteness class of the term defining t for the given values of parameters (this is the definiteness condition for (E ∋ x ↦ t(x))). Thus for each term defining t we have an axiom where ∀(t,E) is replaced by

∀parameters, ∀_SetE, (∀x∈E, dt(x, parameters)) ⇒

Axiom of Extensionality

This axiom lets sets be determined by their role (either as ranges for quantifiers, or as the classes they define by ∈), saying that any two sets playing the same role are equal :

∀_set E,F, E ⊂ F ⊂ E ⇒ E = F.

Indeed, E ⊂ F ⊂ E means that E and F have the same elements (∀x, x ∈ E ⇔ x ∈ F), and for any predicate R,

(∀x∈F, R(x)) ⇔ (∀x∈E, R(x))

and similarly for ∃. Informally, the elements of a set are given in bulk.

Axioms for functions

Functions are also made determined by their role, by the following axiom

(=_Fnc) : ∀_Fnc f,g, (Dom f = Dom g ∧ ∀x∈Dom f, f(x) = g(x)) ⇒ f = g.

The function definers are related with the function evaluator by an axiom which can be written in either way:

∀(t,E), ∀_Fnc f, f = (E ∋ x ↦ t(x)) ⇔ (Dom f = E ∧ ∀x∈E, f(x) = t(x))
∀(t,E), Dom(E ∋ x ↦ t(x)) = E ∧ ∀x∈E, (E ∋ y ↦ t(y))(x) = t(x)

Indeed (Fc) ⇒ (1.⇔(2.∧(=_Fnc))).

The proof can be taken as an exercise (click to show)

Assume (Fc) for 2. to be definite.
Abbreviate (Dom f = E ∧ ∀x∈E, f(x) = t(x)) as [f : E, t], to shorten the above as

∀(t,E), ∀_Fnc f, f = (E ∋ x ↦ t(x)) ⇔ [f : E, t]
∀(t,E), [E ∋ x ↦ t(x) : E, t]

1. ⇔ (2. ∧ 1b.) where

1b. ∀(t,E), ∀_Fncf, [f : E, t] ⇒ f = (E ∋ x ↦ t(x))

Proof of 1b. ⇒ (=_Fnc)

∀_Fnc g, applying 1b. to (g, E) where E = Dom g as ∀x∈Dom g, dg(x) :
∀_Fnc f, [f : E, g] ⇒ f = (E ∋ x ↦ g(x))
[g : E, g] ∴ g = (E ∋ x ↦ g(x))
∀_Fnc f, [f : E, g] ⇒ f = g

Proof of (2.∧(=_Fnc)) ⇒ 1b.

∀(t,E), ∀_Fnc f, ([f : E, t] ∧ [E ∋ x ↦ t(x) : E, t]) ⇒ f = (E ∋ x ↦ t(x)). ∎

Note: (Fc) ∧ 1. can also be written

1'. ∀(t,E), ∀f, f = (E ∋ x ↦ t(x)) ⇔ (Fnc f ∧ Dom f = E ∧ ∀x∈E, f(x) = t(x)).

2.2. Set generation principle

Formalizing diverse notions in set theory

Set theory will be extended by additional components to represent diverse kinds of meta-objects (roles played by symbolic systems) directly as objects (which can play them). This can only apply to meta-objects which are "concrete" i.e. object-like enough, in a sense to be clarified : as seen in 1.8 and 1.D, not all classes can be sets, while only the functors whose domains are sets are represented by functions.
Some of those extensions will consist in primitive "definer" symbols (with the relevant axioms) serving to convert into an existing kind of objects some meta-objects (roles) of the same kind :

Diverse "set-like" classes will be converted into sets (2.2) (thus giving the existence of sets coinciding with these classes);
Indirectly specified elements will become directly designated (2.4).

Other kinds of meta-objects beyond the normal roles of elements, sets and functions, will form new notions : those of operation, relation and tuple will be formalized in 2.3. The new symbols they need are both definers (converting meta-objects into the new objects) and evaluators (interpreting these objects back to their roles as meta-objects).
But, like for unary relations (1.8), these new notions with their role-giving symbols, need not be primitive, as they can be naturally constructed as (represented by) classes of existing objects (sets or functions), which can offer their tools (expressible features) to the new objects. So, these extensions of set theory can be obtained by giving definitions of their definer and evaluator symbols. As the notions of operation and relation can be represented in several useful ways by classes of old objects, which do not represent each new object by the same old object (2.3), functors will be used for translations between the classes playing the roles of these notions in the different ways (2.6, 2.8).

Sets as domains of bounded quantifiers

Bounded quantifiers give sets their fundamental role as ranges of bound variables, unknown by the predicate ∈ which only gives them their role of classes. Technically, the bounded quantifier (∃ ∈ , ) suffices to define the predicate ∈ as

x ∈ E ⇔ (∃y∈E, x = y)

but is not generally definable from it in return by any bounded formula, as its definition involves an open quantifier.
Philosophically speaking, the perception of a set as a class (ability to classify each object as belonging to it or not) does not suffice to provide its full perception as a set, that is the perspective over all its elements as coexisting. This intuitive view of a set as a coexistence of its elements can be seen as supported by the above definition of membership as a form of existence with respect to this set. This way, Russell's paradox means that mathematical objects cannot all coexist as a ultimate totality.

Statement of the principle

For any data of a class C and a quantifier Q defined by given bounded formulas with the same parameters, if the equivalence between Q and ∀_C, namely

∀A, (∀x, C(x) ⇒ A(x)) ⇔ (Qx, A(x))

is proven by second-order universal introduction (*) under a given condition on parameters, then C is a set that can be named by a new operator symbol K added to the language of set theory, with these parameters as arguments, and the following axiom which expresses K = C by double inclusion: for all values of parameters satisfying the condition,

Set(K) ∧ (∀x∈K, C(x)) ∧ (Qx, x ∈ K).

(*) we shall focus on cases whose proofs do not use any axiom obtained from a second order axiom by using A in the definition of its variable structure, though I am not sure if any such use is possible.
This principle will be justified in 2.B.
The equivalence condition between Q and ∀_C is equivalently expressible as the following list of 3 statements, where the quantifier Q* defined by (Q*x, A(x)) ⇎ (Qx,¬A(x)) will be equivalent to ∃_C:

(1) ∀x, C(x) ⇔ Q*y, x=y (in fact we just need ∀x, C(x) ⇒ Q*y, x=y)
(2) Qx, C(x)
(3) ∀A,B, (∀x, A(x) ⇒ B(x)) ⇒ ((Qx, A(x)) ⇒ (Qx, B(x))).

Proof of equivalence (click to show)

These 3 properties are already known consequences of «Q = ∀_C». Conversely,
((2) ∧ (3)) ⇒ ((∀_Cx, A(x)) ⇒ Qx,A(x))
((1) ∧ ∃_Cx, A(x)) ⇒ ∃y, (Q*x, x=y) ∧ (∀x, x=y ⇒ A(x)) ∴ ((3) ⇒ Q*x,A(x)) ∎

Main examples

Our first and main use cases of this principle are listed in the following table. To justify all of these, (1) comes by finding C to be the formula so defined from Q, while (3) is ensured by the absence of any "negative" occurrence of A in Q (inside a negation, an equivalence, or left of a ⇒). This leaves (2) as the easy remaining condition to check.
The first column recalls the above generic abbreviations ; each other column gives an effective example. The K line gives the notations for the set theoretical symbols so introduced.

K	{y∈E \| B(y)}	⋃E	Im f	∅	{y}	{y,z}
C(x)	x∈E ∧ B(x)	∃y∈E, x∈ y	∃y∈Dom f, f(y)=x	0	x=y	x=y ∨ x=z
dK	∀x∈E, dB(x)	Set(E)∧∀y∈E, Set(y)	Fnc(f)	1	1	1
Qx, A(x)	∀x∈E, B(x)⇒A(x)	∀y∈E, ∀x∈y, A(x)	∀x∈Dom f, A(f(x))	1	A(y)	A(y) ∧ A(z)
Q*x,A(x)	∃x∈E, B(x) ∧ A(x)	∃y∈E, ∃x∈y, A(x)	∃x∈Dom f, A(f(x))	0	A(y)	A(y) ∨ A(z)

The value of the set-builder K={x∈E | B(x)} first defined in 1.8 as a class, thus with an open quantifier (∀x, x∈K ⇔ (x∈E ∧ B(x))) is now re-defined by axioms without open quantifier beyond parameters:

Set(K) ∧ (∀x∈K, x∈E ∧ B(x)) ∧ (∀x∈E, B(x) ⇒ x ∈ K)

or more shortly Set(K) ∧ K ⊂ E ∧ (∀x∈E, x ∈ K ⇔ B(x)). Like the axioms for functions, this is an axiom schema by second-order universal elimination on ∀B, so with all unary predicates B defined by bounded formulas with parameters.
This way, the proof of Russell's paradox would be written
∃F, (F={x∈E | Set(x) ∧ x∉x} ∴ Set(F) ∴ (F∈F ⇎ (Set(F) ∧ F∉F))) ∴ F ∉ E

The functor ⋃ is the union symbol, and its axioms form the axiom of union.

The set Im f of values of f(x) when x ranges over Dom f, is called the image of f.
We define the predicate (f : E → F) that reads «f is a function from E to F », while E and F are sets (definiteness condition which may also be integrated in this definition), as

(f : E → F) ⇔ (Fnc(f) ∧ Dom f = E ∧ Im f ⊂ F)

A set F such that Im f ⊂ F (i.e. ∀x ∈ Dom f, f(x) ∈ F), is called a target set of f.
The more precise (Fnc(f) ∧ Dom f = E ∧ Im f = F) will be denoted f : E ↠ F (f is a surjection, or surjective function from E to F, or function from E onto F).

The empty set ∅ is the only set without element, and is included in any set (∀_SetE, ∅ ⊂ E).
Thus, (E=∅ ⇔ E ⊂ ∅ ⇔ ∀x∈E, 0), and thus (E ≠ ∅ ⇔ ∃x∈E, 1).
The presence of this constant symbol ∅ ensures the existence of a set; for any set E we also get ∅ = {x∈E | 0}. We have ⋃∅ =∅.
As (Dom f = ∅ ⇔ Im f = ∅) and (Dom f = Dom g = ∅ ⇒ f = g), the only function with domain ∅ is called the empty function ∅_↦.
That a variable may have empty range, is no obstacle for fixing it. In particular, developing an inconsistent theory means studying a fixed system whose range of possibilities is empty. We may actually need to do so in order to discover a contradiction there, which means to prove that no such system exists.

We can redefine ∃ from the above in two ways:

(∃x∈E, A(x)) ⇔ {x∈E | A(x)} ≠ ∅ ⇔ (1 ∈ Im(E∋x ↦ A(x))).

For all x, {x,x} = {x}. Such a set with only one element is called a singleton.

For all x and y we have {x, y} = {y, x}. If x ≠ y, the set {x, y} with 2 elements x and y is called a pair.

For any set E, the identity function on E is defined by

Id_E = (E ∋ x ↦ x).

Thus,

Dom Id_E = E = Im Id_E
∀x∈E, Id_E(x) = x.

2.3. Currying and tuples

Formalizing operations and relations

To formalize in set theory every notion of operation (resp. relation) with any fixed arity n>1, acting as n-ary structures between n sets, would involve the following symbols (generalizing the case n = 1 of functions which we took as primitive):

A list of n domain functors (sometimes not available but not needed since domains are usually fixed by the context);
Its evaluator is an (n+1)-ary operator (resp. predicate) written in filled form as f(x₀,…,x_n−1), with arguments one n-ary operation (resp. relation) f, and its n arguments x₀,…,x_n−1;
Its definer binds n variables to their respective ranges on a term (resp. a formula).

Let us give to the operation definers, the following notation which will be later (2.7) justified as reflecting their definitions: in the binary case (n = 2), operations defined by a binary operator A with domains E and F will be written

E×F ∋ (x,y) ↦ A(x,y)

After seeing what operations and relations are (which role they play) let us discuss how they are reducible to something else.
One aspect is that relations are reducible to operations, and vice-versa:

The notion of n-ary relation can be constructed as a class of n-ary operations, by taking the Boolean pair as target set, as we did in 1.4 (for predicates) and will formalize in 2.4.
Inversely, the notion of n-ary operation can be constructed as a class of (n+1)-ary relations (2.7).

These constructions can be useful despite their circularity (they are not decreasing the arity). But for foundational purposes, we need a construction which reduces to lower arity cases, down to sets and functions. There are actually 2 such constructions.

Currying

The simplest constructions of the notions of n-ary operation from our primitive tools for functions, are their curried representations, defined as classes of functions by expressing the operation definer as n successive uses of the function definer (one for each bound variable); and thus the evaluator as n uses of the function evaluator. For n=2, this means a binary operation in curried form b with domains E and F, is defined from a binary operator B, and evaluated back as B, by the formulas

b = (E ∋ x ↦ (F ∋ y ↦ B(x,y)))
Dom b = E ∧ ∀x∈E, Dom b(x) = F ∧ ∀y∈F, B(x,y) = b(x)(y)

whose equivalence, similar to the axiom 1. for functions, is deducible from it.
The first domain functor comes as Dom b = E, but the second one, given as common value of all Dom b(x), is only definite for E ≠ ∅. The intermediate function b(x) = (F ∋ y ↦ B(x,y)) represents the functor defined by B with parameter x (free) and argument y (bound).
This somehow makes each binary operation work as a function evaluator, by giving x the role of a function with argument y ; the curried function evaluator just gives back each function it takes : f ↦ ((Dom f) ∋ y ↦ f(y)).

Similarly, the notion of n-ary relation can be defined in curried form as a class, giving the role of definer to a succession of n−1 uses of the function definer and 1 use of the set-builder. The curried form r of a binary relation with domains E and F, is so defined from a binary predicate R, and evaluated back as R, by the equivalent formulas

r = (E ∋ x ↦ {y ∈ F | R(x,y)})
Dom r = E ∧ (∀x∈E, r(x) ⊂ F ∧ ∀y∈F, R(x,y) ⇔ y ∈ r(x))

This leaves no way to define second domain functor (F cannot be restored from r).
These constructions have the defect of breaking the symmetry of role between arguments. The next construction method, by tuples, will preserve this symmetry, and will be usually preferred for this reason.

Tuples

For any theory and any natural number n, a notion of n-tuple is a notion constructed for its variables to serve as abbreviations for a pack of n variables from previous notions. A 2-tuple is called an ordered pair, a 3-tuple is a triple, a 4-tuple is a quadruple and so on. So each tuple x (so constructed object) represents a meta-function of "interpretation" from this list (meta-set) of n variables, into the model.

The definer and evaluator symbols for tuples take a special form because, while tuples are meta-functions, the role they play in the theory differs from those of functors or functions.

An n-tuple definer is not a binder but an n-ary operator, packing its n variables into one by taking them as its arguments in a parenthesis and separated by commas: ( ,…, ).

Now, each structure (resp. operation, relation) can be re-formalized as a unary structure (resp. a function, a unary relation) with domain a type (resp. a class which is actually a set) of tuples, to be used with the relevant tuples definer. This can be implicitly done by re-interpreting the notation: S(x,y) can be read seeing S either as a binary structure (resp. operation, relation), or as a unary one taking as argument the ordered pair given by the term (x,y) formed by the ordered pair definer with arguments x and y.

The role of n-tuple evaluator is played by a list of n functors called projections which extract each variable from the given pack. This list plays the role of the function evaluator curried in the reverse way, by fixing its meta-argument whose range (the domain of tuples as meta-functions) is not seen by the theory as a set of objects but as a list of n symbols to be taken separately.

The formal details of this construction depend whether we take a generic theory or a set theory.

Tuples axiom

In generic theories, it must be a distinct constructed type for not only each n but also each choice of the type of each of these n variables (if starting from a theory with multiple types - this choice of a list of variables with their types amounts to the choice of a predicate symbol type), but this directly forms a legitimate construction (4.11).
For n=2 (other cases are similar), and a type P of ordered pairs (x,y) where x,y have types E and F, the projections π₀ and π₁ are related to the ordered pair definer, by the axiom

∀_Ex, ∀_Fy, ∀_Pz, (x = π₀(z) ∧ y = π₁(z)) ⇔ (x,y) = z

which implies

∀_Pz, (π₀(z), π₁(z)) = z
∀_Ex, ∀_Fy, π₀(x,y) = x
∀_Ex, ∀_Fy, π₁(x,y) = y
∀_Ex,x', ∀_Fy,y', (x,y) = (x',y') ⇔ (x = x' ∧ y = y')

Tuples in set theory

Set theory having no types only needs one notion of n-tuple for each n. But it faces one difficulty : in order to rigorously justify that all tools of set theory remain applicable to tuples in the same way as to the rest of elements, tuples need to be integrated among elements. Namely, set theory needs to develop its notions of tuples by defining them as classes of already existing elements, instead of simply accepting them by the above constructions, which would leave them as separate types.

For this, since tuples are meta-functions, they are naturally represented by functions. This involves copying their meta-domain, which is a list of symbols, into a set (of elements). Namely, let us represent the domain of considered n-tuples by the set V_n of n numbers from 0 to n−1 all named by constant symbols we shall simply conceive as digits. To justify this as a development of set theory, these digits can be defined by any ground terms with distinct values, such as ∅, {∅}, {∅, {∅}} and endlessly many more sets and functions expressed from there. The notion of n-tuple is then defined as the class of all functions with domain V_n.

Then, the projections are defined by the function evaluator applied to the respective digits, which can be read as disguised in their notations : for all n-tuples x,

∀i∈V_n, π_i(x) = x(i).

This index notation, making a pun of the equivalence between reading the index i as an argument or a meta-argument that means to take the i-th symbol from a list (here π_i is the i-th projection), is also commonly used for tuples : x_i can be either read as the i-th variable in the tuple x = (x₀,…,x_n−1), or as a third notation for the same evaluator after x(i) and π_i(x).

More tools need to be introduced before defining in set theory the tuple definers (2.4) and the operation definers working with tuples (2.7).

2.4. Uniqueness quantifiers and conditional operator

Uniqueness quantifiers

For all sets F ⊂ E, all unary predicate A definite on E, and all x ∈ E,

x ∈ F	⇔ {x} ⊂ F ⇔ (∃y∈E, x=y ∧ y∈F) ⇔ (∀y∈E, x=y ⇒ y∈F)
x ∈ F	⇒ ((∀y∈F, A(y)) ⇒ A(x) ⇒ ∃y∈F, A(y))
F ⊂ {x}	⇔ (∀y∈F, x=y) ⇒ ((∃y∈F, A(y)) ⇒ A(x) ⇒ (∀y∈F, A(y)))
F={x}	⇔ (x∈F ∧ ∀y∈F, x=y) ⇔ (∀y∈E, y∈F ⇔ x=y)

Here are 3 new quantifiers: ∃^≥2 (plurality), ! (uniqueness), and ∃! (existence and uniqueness), whose results when applied to A in E only depend on F={x∈E | A(x)} (like ∃ and unlike ∀) :

(∃x∈E, A(x)) ⇔ (F ≠ ∅)	⇔ (∃x∈F, 1) ⇔ (∃x∈E, {x} ⊂ F)
(∃^≥2x∈E, A(x)) ⇔ (∃^≥2: F)	⇔ (∃x,y∈F, x ≠ y) ⇔ (∃x≠y∈E, A(x) ∧ A(y))
(!x∈E, A(x)) ⇔ (!:F)	⇔ ¬(∃^≥2: F) ⇔ (∀x,y∈F, x=y) ⇔ ∀x∈F, F ⊂ {x}
(∃!x∈E, A(x)) ⇔ (∃!:F)	⇔ (∃x∈F, F⊂{x}) ⇔ (∃x∈E, F={x})
F ⊂ {x}	⇒ (∀y∈F, F⊂{y}) ⇔ (!:F)
(∃!:F)	⇔ (F≠∅ ∧ !: F)
F≠∅	⇒ ((∀y∈F, B(y)) ⇒ (∃y∈F, B(y)))
( !: F)	⇒ ((∃y∈F, B(y)) ⇒ (∀y∈F, B(y)))
F={x}	⇒ ((∃y∈F, B(y)) ⇔ B(x) ⇔ ∀y∈F, B(y))

Single element axiom

Let us introduce a new primitive symbol ℩ to serve as inverse of the singleton functor { } : only definite on the class of singletons d℩E ⇔ (Set(E) ∧ ∃!:E), it gives their unique element, as defined by the single element axiom which can be equivalently written

∀x, ℩{x} = x
∀_SetE, ∃!:E ⇒ ℩E ∈ E

Then for every unary predicate A and every singleton E, A(℩E) ⇔ (∃x∈E, A(x)) ⇔ (∀x∈E, A(x)).

Conditional connective

Following conventions from computer science, let us denote the ternary connective «If A then B otherwise C» as

(A ? B : C) ⇔ (¬C⇒A⇒B) ⇔ ((A⇒B)∧(¬A⇒C)) ⇔ (¬A ? C : B)
⇔ ((A∧B)∨(¬A∧C)) ⇔((C⇒A)⇒(A∧B)) ⇎ (A ? ¬B : ¬C)

Any n+1-ary connective K amounts to two n-ary ones: K(A) ⇔ (A ? K(1): K(0)). Thus,

¬A ⇔ (A ? 0 : 1)
(A⇒B) ⇔ (A ? B : 1)
(A∧B) ⇔ (A ? B : 0)
(A∨B) ⇔ (A ? 1: B)
(A ⇔ B) ⇔ (A ? B : ¬B).

Conditional operator

Like the conditional connective, it chooses between two objects x,y depending on the boolean B:

(B ? x : y) = ℩{z∈{x,y}| B ? z=x : z=y} = ℩{z∈{x,y}| z=x ⇔ B}

so that for any unary predicate A,

A(B ? x : y) ⇔ (B ? A(x) : A(y)).

All para-operators other than connectives but with at least a Boolean argument, are naturally expressed as composed of the conditional operator with operators, or the conditional connective with predicates, which is why they did not need an explicit presence in the language of a theory.

Relations as operations

For a given n, once given the notion of n-ary operation with its definer and its evaluator denoted as in 2.3, the notion of n-ary relation can be defined as a role of the class of n-ary operations with values in V₂ = {0,1} (serving as Boolean type), as follows. Booleans B are converted to and from objects b∈V₂, by b = (B ? 1 : 0) and B ⇔ b = 1.
So with n=2 (other cases are similar), such a binary operation r with domains E and F gets this role of binary relation, being defined from any binary predicate R, and evaluated back as R, by the formulas

r = (E×F ∋ (x,y) ↦ (R(x,y) ? 1 : 0))
∀x∈E, ∀y∈F, R(x,y) ⇔ r(x, y) = 1

Defining the tuples definers

The ordered pair definer can be defined as a curried view on the conditional operator:

(x,y) = (V₂ ∋ a ↦ (a = 1 ? y : x))

Confusing Booleans with objects, (B ? x : y) = π_B(y,x).
Other tuples can be similarly defined in diverse ways, such as

(x,y,z) = (V₃ ∋ a ↦ (a = 2 ? z : (a = 1 ? y : x)))

Given the notion of ordered pairs, the roles of notions of tuples of any higher arity n > 2 can also be played by other classes, with other definitions of their definers and projections, satisfying the same axioms. For example, triples t = (x,y,z) can be defined as t = ((x,y),z)) and evaluated by

x = π₀(π₀(t))
y = π₁(π₀(t))
z = π₁(t)

2.5. Families and Boolean operators on sets

Families

The notion of family is formally synonymous with that of function, but meant to play roles intuitively similar to those of tuples : while the domains of tuples are finite and made of symbols or numbers, those of families can be more general but still seen as intuitively "simple" sets, fixed in each context of use, and outside the "main studied system" containing the target set. A «family of...» is a family whose image is a «set of... » (included in the said notion).

Families use the formalism of functions disguised in the style of the symbols for tuples (themselves inapplicable due to their finiteness for infinite domains). The evaluator for a family u at i, instead of u(i), is written π_i(u) or u_i (looking like a meta-variable symbol of variable). The family definer on a term t is written (t(i))_i∈I or sometimes with ellipsis as (t(0),…,t(n−1)) abbreviating the n-tuple definer, instead of I ∋ i ↦ t(i). The argument i is called index, and the family is said indexed by its domain I. We may write u = (u_i)_i∈I as a shorthand for Fnc u ∧ Dom u = I.

When formalizing model theory in set theory, the "lists" of components forming the contents of theories are basically families. More precisely:

The list of abstract types is a set of urelements, serving as indexing set (i.e. domain) for the family of interpreted types
Both lists of predicate symbols and operator symbols, are families of tuples of abstract types
Of the list of axioms, only its image matters, which is a subset of the set of all statements.

Structures and binders

Seeing tuples as particular functions, explains that structures (unary structures on classes of tuples) are particular binders over terms (unary structures on classes of functions defined by these terms: 1.8).
Similarly, logical connectives are particular quantifiers. In particular, ∀ and ∃ respectively generalize the chains of conjunctions and of disjunctions:

(B₀∧…∧B_n−1) ⇔ (∀i∈V_n, B_i)
(B₀∨…∨B_n−1) ⇔ (∃i∈V_n, B_i)

The distributivity of connectives (1.6) also works for quantifiers as follows. For any unary predicate R definite on E and any Boolean C,

(C ∧ ∃x∈E, R(x)) ⇔ (∃x∈E, C ∧ R(x))
(C ∨ ∀x∈E, R(x)) ⇔ (∀x∈E, C ∨ R(x))
(C ⇒ ∀x∈E, R(x)) ⇔ (∀x∈E, C ⇒ R(x))
((∃x∈E, R(x)) ⇒ C) ⇔ (∀x∈E, R(x) ⇒ C)
(∃x∈E, C) ⇔ (C ∧ E≠∅) ⇒ C ⇒ (C ∨ E=∅) ⇔ (∀x∈E, C)

Extensional definitions of sets

The operators of empty set ∅, singleton {a} and pairing {a,b} (2.2) are the first cases (those of arities 0,1,2) of operator symbols of exensional definition of sets (listing their elements) which amount to applying the Im functor to tuples :

{a,b,…} = Im(a,b,…).

The definition of V_n from 2.3 can be written {0,…,n−1}. Images of tuples are finite sets.
Similar notations are used for the binder also given by Im:

{T(x)| x∈E} = {T(x)}_x∈E = Im(E ∋ x ↦ T(x))

As this notation looks similar to the set builder, both may be combined into a third notation :

{T(x)| x∈E ∧ R(x)} = {T(x)| x∈{y∈E|R(y)}}

A function f is said constant when !: Im f. The constancy of a tuple is the chain of equalities:

x=y=z ⇔ !:{x,y,z} ⇔ ((x=y)∧(y=z)) ⇒ x=z

Union of a family of sets

The binder of union of any family of sets (F = (F_i)_i∈I ∧ ∀i∈I, Set F_i), is defined from the previous functor of union of a set of sets (2.2) :

⋃ F = ⋃
i∈I F_i = ⋃{F_i}_i∈I = ⋃ Im F
∀x, x ∈ ⋃
i∈I F_i ⇔ (∃X∈ Im F, x∈X) ⇔ ∃i∈I, x∈F_i

X ∪ Y = ⋃{X, Y}
∀x, x ∈ X∪Y ⇔ (x∈X ∨ x∈Y)
X ∪ Y ∪ Z = ⋃{X, Y, Z} = (X ∪ Y) ∪ Z

∀_fncf, Im f = ⋃_{x∈Dom f} {f(x)}
(∀x∈⋃
i∈I F_i, A(x)) ⇔ ∀i∈I, ∀x∈F_i, A(x)
(∃x∈⋃
i∈I F_i, A(x)) ⇔ ∃i∈I, ∃x∈F_i, A(x)
∀_SetE, ⋃
i∈I F_i ⊂ E ⇔ ∀i∈I, F_i ⊂ E

In return, the union of a set of sets is expressible from that of a family : ⋃ E = ⋃ Id_E because Im Id_E = E.

Other Boolean operators on sets

A binary predicate with domains a set I and a class C, can be seen in either curried way, as a meta-family (A_i)_i∈I of unary predicates definite in C, or as a family of Boolean variables (A_i(x))_i∈I depending on a common parameter x with range C. This way, any quantifier Q with range I (thus, for finite I, any connective) defines a meta-binder (resp. a meta-operation) on this meta-family (resp. this meta-tuple) of unary predicates (sub-classes of C), with result a unary predicate R, defined as

∀_Cx, R(x) ⇔ Qi, A_i(x).

When C is a set E, this defines a genuine binder (resp. operation) acting in the class of subsets of E. However, the result depends a priori not only on the given family of sets but also on the chosen set E. This parameter may be kept implicit as fixed in some contexts, but in others it needs to be explicitly added as argument.

The functor so defined by the negation connective (¬) is the complement ∁_E (with set parameter E) : ∁_E F is called the complement of F in E. It is synonymous with the following binary operator \ of difference between any two sets, except that ∁_E F is considered only definite if F⊂E :

∀_SetE,F, E\F = {x∈E | x ∉ F}
∀_SetE,F, F⊂E ⇒ ∁_E F = E\F ⊂ E ∴ (∀x∈E, x ∈ ∁_E F ⇔ ¬x∈F)

Let us analyze the issue of the dependence on E in the general case. Any family of sets F = (F_i)_i∈I is a family of subsets of its union U = ⋃_i∈I F_i and of any larger set E ⊃ U.

In any of these, the existential quantifier (Q = ∃) defines the union binder:

(∀i∈I, F_i ⊂ E) ⇒ ⋃
i∈I F_i = {x∈E| ∃i∈I, x∈F_i}

The binder defined by any Q gives a set X = {x∈E| R(x)} where R(x) abbreviates (Qi, x ∈ F_i). For this to define a binder (resp. operation) on the class of all sets, this X must be independent of the chosen E ⊃ U, and then it can be written

X = {x∈U| R(x)}.

Noting that

∀x, x ∈ U ∨ (R(x) ⇔ Qi, 0)

the needed independence condition appears to be ¬Qi, 0 which also implies that X coincides with the class R.
The difference operator can be so defined by the connective (A∧¬B) which satisfies this condition ¬(0∧¬0):

∀_SetE,F, ∀x, x∈ E\F ⇔ (x∈E ∧ x∉F).

Intersection

The quantifier ∀ defines the intersection of any family of subsets F_i of a fixed set E:

⋂
i∈I F_i = {x∈E | ∀i∈I, x ∈ F_i} = ∁_E ⋃
i∈I ∁_E F_i

The condition of independence from E is that I ≠ ∅. As above defined, the intersection of the empty family gives E. The intersection of any non-empty family of sets (I ≠ ∅) can be written

∀j∈I, ⋂
i∈I F_i = {x∈F_j | ∀i∈I, x∈F_i}
∀x, x ∈ ⋂
i∈I F_i ⇔ ∀i∈I, x ∈ F_i
∀_SetG, G ⊂ ⋂
i∈I F_i ⇔ ∀i∈I, G ⊂ F_i
∀_SetE,F, E∩F = {x∈E | x∈F} = F∩E
∀x, x ∈ E∩F ⇔ (x∈E ∧ x∈F)
E∩F ⊂ E ⊂ E∪F
E = E∪F ⇔ F ⊂ E ⇔ F = E∩F

Union and intersection have the same associativity and distributivity properties as ∧ and ∨ :

E∩F∩G = (E∩F)∩G = E∩(F∩G) = ⋂(E,F,G)

E ∩	⋃ i∈I	F_i =	⋃ i∈I	E ∩ F_i	E ∪	⋂ i∈I	F_i =	⋂ i∈I	E ∪ F_i
E∩(F∪G) =			(E∩F)∪(E∩G)		E∪(F∩G) =			(E∪F)∩(E∪G)

Finally the connective ⇎ gives the symmetric difference: E ∆ F = (E∪F) \ (E∩F).

2.6. Graphs

A graph is a set of ordered pairs. Let us denote the class of graphs as gr:

gr R ⇔ (Set R ∧ ∀x∈R, Fnc x ∧ Dom x = V₂).

For any binder Q and any graph G, the formula Qz∈G, A(z₀,z₁) that binds the variable z = (z₀, z₁) on a binary structure A definite on G, can be seen as binding 2 variables z₀, z₁ on A(z₀, z₁), and thus be denoted with a pair of variables: Q(x,y)∈G, A(x,y).

The transpose of an ordered pair is

^t(x,y) = (y,x)

The transpose of a graph R is the image of transposition over it:

^tR = {(y,x)|(x,y) ∈ R}

We define the domain and the image of a graph as the respective images of π₀ and π₁ over it:

Dom R = {x|(x,y) ∈ R}
Im R = {y|(x,y) ∈ R} = Dom ^tR

Currying notation

Graphs R can be expressed in curried forms as functors R⃗ and R⃖:

∀x, R⃗(x) = {y | (z,y)∈R ∧ z=x}.
∀y, R⃖(y) = {x | (x,z)∈R ∧ z=y} = ^tR⃗ (y).
∀x,y, y ∈ R⃗ (x) ⇔ (x,y) ∈ R ⇔ x ∈ R⃖ (y)
∀x, x ∈ Dom R ⇔ R⃗ (x) ≠ ∅
Dom R ⊂ E ⇒ Im R = ⋃_x∈E R⃗(x) ∧ ∀y, R⃖(y) = {x∈E | (x,y) ∈ R}

They can also appear as functions

R⃗_E = (E∋x ↦ R⃗(x))
R⃖_F = (F∋y ↦ R⃖(y))
(R⃗) = R⃗_{Dom R}
(R⃖) = R⃖_{Im R}.

Functional graphs

The graph of a function f is defined by

Gr f = {(x,f(x)) | x ∈ Dom f}
∀x,y, (x,y) ∈ Gr f ⇔ (x ∈ Dom f ∧ y = f(x))
Dom f = Dom Gr f
Im f = Im Gr f

For any function f and any graph R,

Gr f ⊂ R	⇔ ∀x∈Dom f, f(x) ∈ R⃗(x)
R ⊂ Gr f	⇔ (Dom R ⊂ Dom f ∧ ∀(x,y)∈R, y = f(x))
R = Gr f	⇔ (Dom R ⊂ Dom f ∧ ∀x∈Dom f, R⃗(x) = {f(x)} )

A graph R is functional if it is the graph of a function. This condition is equivalently written in either way

∀x∈ Dom R, !: R⃗(x)
∀x,y∈R, x₀ = y₀ ⇒ x₁ = y₁.

It is then the graph of the unique function ℩R⃗ = ((Dom R) ∋ x ↦ ℩(R⃗(x))).

For any set E we shall denote 𝛿_E = Gr Id_E.

Indexed partitions

Two sets E and F are called disjoint when E∩F = ∅, or equivalently ∀x∈E, x∉F.
A family of sets (A_i)_i∈I is called pairwise disjoint when ∀i≠j∈I, A_i∩A_j = ∅
For any graph R with Im R ⊂ F, the family (R⃖(y))_y∈F is pairwise disjoint if and only if R is functional:

(∀y≠z∈F, ¬∃x∈R⃖(y), x ∈ R⃖(z)) ⇔ (∀(x,y)∈R, ∀z∈F, xRz ⇒ y = z)

For any function f and any y we define the fiber of y under f as

f_•(y) = {x∈Dom f | f(x) = y} = (Gr⃖ f)(y)

When f : E → F this defines a family f_•F = (f_•(y))_y∈F of pairwise disjoint subsets of E:

∀y,z∈F, f_•(y) ∩ f_•(z) ≠ ∅ ⇒ ∃x∈ f_•(y) ∩ f_•(z), y = f(x) = z
⋃_y∈F f_•(y) = E
Im f = {y∈F | f_•(y) ≠ ∅}.

An indexed partition of a set E is a family of nonempty, pairwise disjoint subsets of E, whose union is E.
In other words it is any family of the form f_• = f_{•Im f} for any function f with domain E.

Sum or disjoint union

The binder ∐ of sum of any family of sets (E_i)_i∈I, gives a graph ∐_i∈I E_i defined as

∐
i∈I E_i = ⋃
i∈I {(i,x) | x∈E_i}
∀i,x, (i,x) ∈ ∐
i∈I E_i ⇔ (i∈I ∧ x∈E_i)
(∀(i,x)∈∐
i∈I E_i, A(i,x)) ⇔ (∀i∈I, ∀x∈E_i, A(i,x))
(∀i∈I, E_i ⊂ E′_i) ⇔ ∐_i∈I E_i ⊂ ∐_i∈I E′_i
E₀⊔…⊔E_n−1 = ∐_{i∈V_n} E_i

This binder serves as inverse of currying : R = ∐_i∈I E_i ⇔ (Dom R ⊂ I ∧ ∀i∈I, R⃗(i) = E_i).
It is also called disjoint union as the family of copies {(i,x) | x∈E_i} of each E_i is pairwise disjoint, with the function from R to I given by π₀.

Direct and inverse images by a graph

The restriction of a graph R to a set A is defined as

R_|A = {(x,y)∈R | x∈A} = ∐_x∈A R⃗(x)

The direct image of a set A by a graph R is

R^⋆(A) = Im R_|A = ⋃_x∈A R⃗(x) = {y |(x,y)∈R∧x∈A} ⊂ Im R.
Dom R ⊂ A ⇔ R_|A = R ⇒ R^⋆(A) = Im R

R^⋆(⋃
i ∈ I A_i) = ⋃
i ∈ I R^⋆(A_i)
R^⋆(⋂
i ∈ I A_i) ⊂ ⋂
i ∈ I R^⋆(A_i)

A ⊂ B ⇒ R^⋆(A) ⊂ R^⋆(B)

Similarly, the inverse image or preimage of a set B by a graph R is

R_⋆(B) = ^tR^⋆(B) = ⋃_y∈B R⃖(y) = {x | (x,y)∈R∧ y ∈ B} ⊂ Dom R.

Direct image and preimage by a function

The direct image of a subset A ⊂ Dom f by a function f, denoted f[A] or f^⋆(A), is

f[A] = (Gr f)^⋆(A) = {f(x) | x∈A} ⊂ Im f

For any f : E → F and B ⊂ F, the preimage of B by f, written f_⋆(B), is defined by

f_⋆(B) = (Gr f)_⋆(B) = {x∈E | f(x) ∈ B} = ⋃
y∈B f_•(y)
f_•(y) = f_⋆({y})

f_⋆(∁_F B) = ∁_Ef_⋆(B)

For any family (B_i)_{i ∈ I} of subsets of F, f_⋆(⋂_i∈IB_i) = ⋂_i∈I f_⋆(B_i) where intersections are respectively interpreted as subsets of F and E.

2.7. Products and powerset

Cartesian product of two sets

For any two sets E and F, the (cartesian) product E×F is the set of (x,y) where x∈E and y∈F.

E×F = ∐_x∈E F

∐
i∈IE_i = ⋃
i∈I {i}×E_i ⊂ I×⋃
i∈I

E_i
(E⊂E′ ∧ F⊂F′) ⇒ E×F ⊂ E′×F′
E×∅ = ∅ = ∅×E

For any graph R,

R ⊂ E×F ⇔ (Dom R ⊂ E ∧ Im R ⊂ F)
R_|E = R∩(E× Im R)

Quantifiers over a product have equivalent curried views:

(∃(x,y)∈E×F, A(x,y)) ⇔ (∃x∈E, ∃y∈F, A(x,y)) ⇔ (∃y∈F, ∃x∈E, A(x,y))
(∀(x,y)∈E×F, A(x,y)) ⇔ (∀x∈E, ∀y∈F, A(x,y)) ⇔ (∀y∈F, ∀x∈E, A(x,y))
(∃x∈E, A(x)∨B(x)) ⇔ ((∃x∈E, A(x)) ∨ (∃x∈E, B(x)))
(∀x∈E, A(x)∧B(x)) ⇔ ((∀x∈E, A(x)) ∧ (∀x∈E, B(x)))
(∃x∈E, C∨A(x)) ⇔ ((C ∧ E≠∅) ∨ ∃x∈E, A(x))
(∀x∈E, C∧A(x)) ⇔ ((C ∨ E=∅) ∧ ∀x∈E, A(x))

Quantifiers (Qx,y∈E, ) can be equivalently written (Q(x,y)∈E×E, ) while the notation Qx≠y∈E can be read as one quantifier with domain {(x,y)∈E×E | x≠y} = (E×E) \ 𝛿_E.

Finite products, operations and relations

Beyond the binary operator of cartesian product, there is an n-ary product operator between any number n>1 of sets : E₀×…×E_n−1 is the set of all n-tuples (x₀,…,x_n−1) such that ∀i∈V_n, x_i∈E_i.
For example, the ternary product is

E×F×G = ⋃_x∈E⋃_y∈F {(x,y,z) | z∈G} = {(x,y,z) | ((x,y),z))∈(E×F)×G}

where the latter notation abbreviates the use of the evaluators for this representation of triples. It can also be seen as directly justified by the set generation principle:

(∀(x,y,z)∈E×F×G, A(x,y,z)) ⇔ (∀x∈E, ∀y∈F, ∀z∈G, A(x,y,z))

This finally justifies the most natural formalization of n-ary operations, as functions with domain a product of n sets, with evaluator given in 2.3. It justifies our first notation of 2.3 for operation definers, as a use of the abbreviated notation of 2.6 for binders on graphs, extensible to any arity.

The notion of n-ary relation R will usually be formalized as that of a set of n-tuples, subset of the product of its domains. This way, a binary relation between sets E and F will be a graph R ⊂ E×F. The equality relation on a set E is so formalized as 𝛿_E, called the diagonal of E×E.
For a formalization which specifies the domains E and F of arguments, we can take the triple (E,F,R).

Translating operators into predicates

The use of any functor symbol K in a generic theory is equivalent to that of a binary predicate symbol R with the functionality axiom

∀x, ∃!y, xRy

It is meant as the "graph of K", i.e. defined from K by

xRy ⇔ y = K(x)

Terms of first-order logic using K cannot be re-expressed using R, but formulas can: each formula with each occurrence of K in it, can be written A(K(x)) where x abbreviates a term and A abbreviates a formula as a unary predicate; then equivalently replaced by either of

∃y, xRy ∧ A(y)
∀y, xRy ⇒ A(y)

The same goes for other arities : the role of any n-ary operator can be played by an (n+1)-ary predicate with the functionality axiom. This generalization can be explained as replacing the argument by a tuple.
This construction does not fit for the operators of our set theory, due to its use of open quantifiers. Yet it works when domains are sets: n-ary operations can be represented as functional (n+1)-ary graphs.

More primitive symbols

Let us strengthen set theory by 3 extensions that declare more kinds of classes to be sets : powersets, exponentiations and products. These extensions are equivalent, in the sense that accepting anyone of them as primitive suffices to get both others as developments from it. They are similar to those provided by the set generation principle, but no more fit its condition ; this lack of justification has deep connections with incompleteness aspects of mathematics.
To fit our way of formalizing set theory (unlike the ZF way introduced below), each such extension is made of

A symbol K (namely, two operators and one binder) naming as a set the given class C, with arguments the parameters of C here abbreviated as a family y ranging over the intended definiteness class of K;
The statement ∀_dKy, Set K(y) ∧ ∀x, x ∈ K(y) ⇔ C_y(x).

Powerset. ∀_Set E, Set ℘(E) ∧ ∀F, F∈℘(E) ⇔ (Set F ∧ F ⊂ E)
We shall abbreviate ∈℘ as ⊂ also in binders: for example (∀F⊂E,…) ⇔ (∀F∈℘(E),…).

Exponentiation. ∀_Set E,F, Set F^E ∧ ∀f, f ∈ F^E ⇔ f : E → F

Product of a family of sets. The binder ∏ generalizes the finite product operators to any family of sets (∀i∈I, Set E_i):

∀x, x ∈ ∏
i∈I E_i ⇔ (Fnc(x) ∧ Dom x = I ∧ ∀i∈I, x_i ∈ E_i).

Like with tuples, the function evaluator curried in the reverse way defines a family of projections π_i (keeping implicit their dependence on the given product which is usually fixed in context):

∀i∈I, π_i : ∏_j∈I E_j → E_i.

These symbols preserve inclusion as follows

F ⊂ E ⇒ ℘(F) ⊂ ℘(E)
F ⊂ E ⇒ F^G ⊂ E^G
(∀i∈I, F_i ⊂ E_i) ⇒ ∏_i∈I F_i ⊂ ∏_i∈I E_i

Their equivalence

From any of these 3 symbols, both others can be defined as follows.

∏
i∈I E_i = {x∈ (⋃
i∈I E_i)^I | ∀i∈I, x_i ∈ E_i} = {℩R⃗ | R ⊂ ∐
i∈I E_i ∧ ∀i∈I, ∃!: R⃗(i)}

F^E = ∏_x∈E F = {℩R⃗ | R ⊂ E×F ∧ ∀x∈E, ∃!: R⃗(x)}
℘(E) = {f_•(1) | f ∈ V₂^E}

Even some cases are expressible from previous tools:

F^{a}= {{a}∋x↦y | y ∈ F}
F^∅ = ∏∅_↦ = {∅_↦}
℘({a}) = {∅,{a}}
∏
i∈I∪J E_i = {(i∈I ? x_i : y_i)_i∈I∪J | (x,y) ∈ ∏
i∈IE_i × ∏
i∈JE_i}
(∃i∈I, E_i = ∅) ⇒ ∏
i∈IE_i = ∅
(∀i∈I, ∃!:E_i) ⇒ ∏
i∈IE_i = {(℩E_i)_i∈I}

Cantor's Theorem

This famous theorem is simply expressible as

∀_Fnc f, ℘(Dom f) ⊄ Im f.

Proof for the case when all f(x) are sets (to which the general case is reducible):
(E = Dom f ∧ F = {x∈E| x ∉ f(x)}) ⇒ (∀x∈E, x ∈ F ⇎ x ∈ f(x)) ⇒ (∀x∈E, F ≠ f(x)) ⇒ F ∉ Im f.∎

Russell's paradox may be seen as the particular case of Cantor's theorem for f = Id_E.

The ZF approach

In the traditional ZF formalism for set theory, each such extension is formalized as first made of a mere statement without any special symbol, keeping ∈ as the only primitive structure:

∀_...y, ∃X, ∀x, x ∈ X ⇔ C_y(x).

Namely, the powerset axiom (existence of the powerset) is written ∀E, ∃X, ∀F, F∈X ⇔ F ⊂ E, from which the statements of existence of exponentiations and products come as theorems.
Then, the uses of each symbol K with argument y as above, naming these sets, can be justified as developed from there as abbreviations by the following rules.

∀x∈K means ∀x, C(x)⇒ ;
The equality (X=K) means (∀x, x ∈ X ⇔ C(x)) or equivalently (∀x∈X, C(x)) ∧ (∀_Cx, x ∈ X);
Any other formulas using K with y free, written as A(K), mean the formula so abbreviated as ∃X, (X=K)∧A(X).
To justify binding y on terms using K, involves the axiom schema of replacement.

But our framework does not even allow the first of these cases, because it involves an open quantifier, which cannot be replaced by a bounded formula because these extensions do not fit the condition of the set generation principle.
This impossibility to interpret the equality formula (X=K) means that even a given set identical to a class C, would not be recognizable as such. Hence the need to accept the symbols K as primitive, for giving effective sense to the statement that C coincides with a set.

2.8. Injections, bijections

Composition, restriction

The composite of any 2 functions f,g such that Im f ⊂ Dom g is defined by

g⚬f = ((Dom f) ∋ x ↦ g(f(x)))
(f : E → F ∧ g : F → G) ⇒ g⚬f : E → G

(This notation will be abusively kept if g is a functor, in which case this is a schema of definitions).
Similarly with more than 2 functions: adding a third one h : G →H to the above conditions, we define

h⚬g⚬f = (h⚬g)⚬f = h⚬(g⚬f) = (E ∋ x ↦ h(g(f(x))))

The restriction of a function f to a set A ⊂ Dom f is

f_|A = (A ∋ x ↦ f(x)) = f⚬Id_A.
Gr(f_|A) = (Gr f)_|A
f[A] = Im(f_|A)
∀_Fnc f,g, Gr g ⊂ Gr f ⇔ (Dom g ⊂ Dom f ∧ g = f_{|Dom g})

When g is a restriction of f, we also say that f is an extension of g.

Injections, bijections, inverse

A function f is called injective (or : an injection ), if ^tGr f is a functional graph:

Inj f ⇔	(∀x≠x′∈Dom f, f(x) ≠ f(x′))
⇔	(∀x,x′∈Dom f, f(x) = f(x′) ⇒ x=x′)
(f : E ↪ F) ⇔	(f : E → F ∧ Inj f)
Im f ⊂ F ⇒	(Inj f ⇔ ∀y∈F, !:f_•(y))

The inverse of any injection f, is the function f^-1 with graph ^tGr f:

f^-1 = ℩f_• = (Im f ∋ y ↦ ℩f_•(y))
Gr(f^-1) = ^tGr f
(f^-1)^-1 = f.

A function f : E → F is said bijective (or a bijection) from E onto F, if it is both injective and surjective :

f : E ↔ F ⇔	(f : E → F ∧ ∀y∈F, ∃!: f_•(y))
⇔	(f : E ↪ F ∧ Im f = F)
⇔	(f : E ↠ F ∧ Inj f)
⇒	f^-1: F ↔ E.

In particular Id_E : E ↔ E.

Diverse properties

Proposition. Let f : E → F and g : F → G. Then

(Inj f ∧ Inj g) ⇒ Inj(g⚬f)
Inj(g⚬f) ⇒ Inj f
Im(g⚬f) = g[Im f] ⊂ Im g.

Proofs:

∀x,y∈E, g(f(x)) = g(f(y)) ⇒ f(x) = f(y) ⇒ x = y.
∀x,y ∈ E, f(x) = f(y) ⇒ g(f(x)) = (g(f(y)) ⇒ x = y.
∀z∈G, z ∈ Im(g⚬f) ⇔ (∃x∈E, g(f(x))=z) ⇔ (∃y∈ Imf, g(y)=z) ⇔ z ∈ g[Imf]. ∎

From 3. we can obviously deduce

Im(g⚬f) = G ⇒ Im g = G
Im f = F ⇒ Im(g⚬f) = Im g
(f:E ↠ F) ⇒ (g:F ↠ G ⇔ g⚬f:E ↠ G)

Let f : E → F, and a function g with Dom g = F. Then, easily,

g⚬f = Id_E	⇔ (∀x∈E, ∀y∈F, f(x) = y ⇒ g(y) = x)
	⇔ Gr f ⊂ ^tGr g ⇔ (∀y∈F, f_•(y)⊂{g(y)})
	⇔ (∀x∈E, f(x)∈g_•(x)) ⇔ (Inj f ∧ g_{\|Im f} = f^-1)
	⇒ (Im f = F ⇔ g = f^-1 ⇔ (Inj g ∧ Im g ⊂ E))

Proof of the non-obvious step :

(g⚬f = Id_E ∧ Inj g ∧ Im g ⊂ E) ⇒ (∀y∈F, g(f(g(y))) = g(y) ∴ y = f(g(y)) ∈ Im f)

Proposition. (f : E → F ∧ g : F → E ∧ g⚬f = Id_E) ⇒ (Im f = F ⇔ Inj g ⇔ f⚬g = Id_F ⇔ g = f^-1).

Proof : (Inj g ∧ g⚬f⚬g = g⚬Id_F) ⇒ f⚬g = Id_F. ∎

Proposition. 1) If f,g are injective and Im f = Dom g, then (g⚬f)^-1 = f^-1⚬g^-1.
2) If f,h : E→F and g : F→E then (g⚬f = Id_E ∧ h⚬g = Id_F) ⇔ ((g : F ↔ E) ∧ f = h = g⁻¹).

Proofs:
1) ∀x∈Dom f, ∀y∈ Im g, (g⚬f)(x) = y ⇔ f(x) = g^-1(y) ⇔ x = (f^-1⚬g^-1)(y).
Other method: (g⚬f)^-1 = (g⚬f)^-1⚬g⚬g^-1 = (g⚬f)^-1⚬g⚬f⚬f^-1⚬g^-1 = f^-1⚬g^-1.
2) We deduce f = h from f = h⚬g⚬f = h, or from Gr f ⊂ ^tGr g ⊂ Gr h. The rest is obvious.∎

Canonical functions

For any objects x,y we shall say that x determines y if there is an invariant functor T (known but kept implicit) such that T(x) = y. Then the role of y as free variable can be played by the term T(x), thus by the use of x.
Similarly, a function f : E → F will be said canonical if it is defined as E ∋ x ↦ T(x) where T is some invariant functor.
Examples of important canonical injections :

E ⊂ F ⇒ Id_E : E ↪ F
(x ↦ {x}) : E ↪ ℘(E)
Gr : F^E ↪ ℘(E×F).

A bijection f will be said bicanonical if both f and f⁻¹ are canonical.
When a bijection f : E ↔ F is canonical (resp. bicanonical), we shall write f : E ⥬ F (resp. f : E ⇌ F); or, with its defining functor φ as φ : E ⥬ F which means that (E ∋ x ↦ φ(x)) is injective with image F. We shall write E ⥬ F (resp. E ⇌ F) to mean the existence of a canonical (resp. bicanonical) bijection, that is kept implicit. These will often look like equality properties on numbers, as the existence of a bijection between finite sets implies the equality of their numbers of elements.
Canonical bijections can fail to be bicanonical especially when their defining functor is not injective:

V₂^E ⥬ ℘(E)
E ≠ ∅ ⇒ {x}^E ⥬ {x}
E×{x} ⥬ E
E×F ⇌ F×E
E×{x} ⇌ {x}×E ⇌ E^{x}
E×{0} ⇌ E.

This meta-relation between sets is preserved by constructions: if E ⥬ E′ and F ⥬ F′ then ℘(E) ⥬ ℘(E'), E×F ⥬ E′×F′ and F^E ⥬ F′^E′ (using the direct image of the graph), and the same for ⇌.
This is how transposition (E×F ⇌ F×E) extends to both graphs and operations:

℘(E×F) ⇌ ℘(F×E)
G^E×F ⇌ G^F×E
∀f∈G^E×F, ^tf = (F×E∋(x,y) ↦ f(y,x)) ∈ G^F×E.

Sums of functions

The binder ∐ of sum of sets defines canonical bijections (whose inverses, defined by R ↦ R⃗_I, depend on I):

(℘(E))^I ⥬ ℘(I×E)
∏_i∈I ℘(A_i) ⥬ ℘(∐_i∈I A_i)

Any sum of sets S = ∐_i∈I A_i has its family of natural injections j_i given by the curried ordered pair definer :

∀i∈I, j_i : A_i ↪ S.

The sum over I of functions f_i where ∀i∈I, A_i = Dom f_i is defined by

∐_i∈I f_i = (S ∋ (i,x) ↦ f_i(x))
g = ∐_i∈If_i ⇔ (Dom g = S ∧ ∀i∈I, f_i = g⚬j_i = g⃗(i))

It is the inverse of the currying of binary operations, which we shall denote like for graphs: for any sets E, F and any operation (function) B with domain E×F ≠ ∅ (from which E and F can be restored as E = Dom Dom B and F = Im Dom B, while if E×F = ∅ the notation can be abusively kept while taking E and F from context),

B⃗ = (E ∋ x ↦ (F ∋ y ↦ B(x,y)))
B⃖ = ^tB⃗ = (F ∋ y ↦ (E ∋ x ↦ B(x,y))

Hence the canonical bijections (bicanonical if I = Dom S, thus if E ≠ ∅ in the case I×E)

∀_SetF, ∏
i∈I F^A_i ⥬ F^S
∀_SetE,F, (F^E)^I ⥬ F^I×E
(∀F : S → Set) ∏
i∈I ∏
x∈A_i F_(i,x) ⥬ ∏
c∈SF_c
(E×F)×G ⇌ E×F×G
f ∈ (F^E)^I ⇒ ∐
i∈I Gr f_i ⊂ ℘(I×(E×F)) ⇌ ℘((I×E)×F) ⊃ Gr ∐
i∈I f_i

Product of functions or recurrying

Both curried forms R⃗ and R⃖ of graphs R ⊂ E×F are exchanged by a bijection ℘(F)^E ↔ ℘(E)^F defined with parameter F.
The similar bijection between both curried forms of operations with a given domain I×E, is defined by the binder ⊓ of product of functions all with domain E, and indexed by I :

(∀i∈I, Dom f_i = E) ⇒ ⊓
i∈I f_i = (E ∋ x ↦ (f_i(x))_i∈I)
∀g, g = ⊓
i∈I f_i ⇔ (g : E → ∏
i∈I Im f_i ∧ ∀i∈I, f_i = π_i⚬g)
(F : I → Set) ⇒ ⊓ : ∏
i∈I F_i^E ⥬ (∏
i∈I F_i)^E
Set F ⇒ ⊓ : (F^E)^I ⥬ (F^I)^E

These bijections are canonical for nonempty families (I ≠ ∅) as only these determine E, and are usually bicanonical as their inverse uses I which is definable when E ≠ ∅; anyway we shall abusively keep the notation ⥬ for the general case while E remains fixed by the context.
A particular case is the binary product (I = V₂):

∀_Fncf,g, Dom f = Dom g = E ⇒ f⊓g = (E ∋ x ↦ (f(x),g(x)))
I^E×F^E ⇌ (I×F)^E
(∐
i∈I F_i)^E ⇌ ∐
ϕ∈I^E ∏
x∈E F_ϕ(x)
∀_Fncf, Gr f = Im(Id_{Dom f} ⊓ f)

2.9. Binary relations on a set

A binary relation on a set E is a relation with both domains equal to E, thus formalized by a graph R ⊂ E×E. Let us abbreviate E×E as E² and (x,y) ∈ R as x R y

Preimages and products

For any function f : E → F and any binary relation R on F, the preimage of R by f is the binary relation f_⋆(R) on E defined as

{(x,y)∈E² | f(x) R f(y)}

For any family of a binary relations (R_i)_i∈I on respective sets F_i, their product is the binary relation on P = ∏_i∈I F_i defined as

{(x,y)∈P² | ∀i∈I, x_i R_i y_i} = ⋂_i∈I π_i_⋆(R_i).

(It is actually in canonical bijection with the product of sets ∏_i∈I R_i)
In particular with a constant family : any binary relation on a set F defines one on any F^I.

Both operations can be combined in one step: for any family of functions f_i : E → F_i, the relation

{(x,y)∈E² | ∀i∈I, f_i(x) R_i f_i(y)} = ⋂_i∈I f_i_⋆(R_i)

is the preimage of the product relation of the R_i on P, by the function ⊓_i∈I f_i : E → P.

These concepts will be later generalized to other first-order structures.

Basic properties

A binary relation R on a set E will be said

reflexive ⇔	∀x∈E, xRx ⇔ 𝛿_E ⊂ R ⇒ Dom R = Im R = E
irreflexive ⇔	∀x∈E, ¬(xRx)
symmetric ⇔	(∀x,y∈E, xRy ⇒ yRx) ⇔ R ⊂ ^tR ⇔ R = ^tR ⇔ (R⃗) = (R⃖)
antisymmetric ⇔	∀x,y∈E, (xRy ∧ yRx) ⇒ x = y
transitive ⇔	∀x,y,z∈E, (xRy ∧ yRz) ⇒ xRz

These properties are preserved by transposition, preimages and products, except that the preimage of an antisymmetric relation by a non-injective function may not be antisymmetric, and the empty product (I=∅) is not irreflexive.
In particular for F ⊂ E, the restriction of a relation R ⊂ E×E to F, namely R∩(F×F), is its preimage by Id_F : F ↪ E, thus preserves all properties.

Example. Let A ⊂ E^E and R = ⋃_f∈A Gr f. Then

(Id_E ∈ A) ⇒	R is reflexive
(∀f,g∈A, g⚬f ∈ A) ⇒	R is transitive
(∀f∈A, f : E↔E ∧ f^-1∈ A) ⇒	R is symmetric

For any transitive binary relation R we denote x R y R z ⇔ (x R y ∧ y R z) ⇒ x R z.

Preorders and orders

A preorder is a reflexive and transitive binary relation. An order is an antisymmetric preorder.
A preordered set is (an ordered pair of) a set with a chosen preorder on it.
An ordered set is a set with a chosen order, usually written as ≤ or ≤_E. The formula x ≤ y can be read «x is less than y», or «y is greater than x».
Two elements x, y of an ordered set are said comparable when (x≤y ∨ y≤x).

On the class of sets, the meta-relation ⥬ (existence of a canonical bijection) is a meta-preorder.
The Boolean pair V₂ is naturally ordered by ⇒ which coincides there with the usual order between numbers.
The resulting product order on V₂^E ⥬ ℘(E) (and thus any set of sets) is that of inclusion (⊂).

Any relation R ⊂ E×F defines a preorder on E as the preimage by R⃗ of the inclusion order:

{(x,y)∈E² | R⃗(x) ⊂ R⃗(y)}.

Conversely, any preorder R on a set E is so definable from itself in both ways (exchanged by transposition):

∀x,y∈E, R⃗(y) ⊂ R⃗(x) ⇔ x R y ⇔ R⃖(x) ⊂ R⃖(y)

Proof. Transitivity is equivalent to (∀x,y∈E, x R y ⇒ R⃗(y) ⊂ R⃗(x)) and to (∀x,y∈E, x R y ⇒ R⃖(x) ⊂ R⃖(y)).
Reflexivity is equivalent to the converses:

∀x∈E, (∀y∈E, R⃗(x) ⊂ R⃗(y) ⇒ y R x) ⇔ x R x ⇔ (∀y∈E, R⃖(x) ⊂ R⃖(y) ⇒ x R y). ∎

Thus R is the preimage of ⊂ by (R⃖). Moreover,

∀x,y∈E, R⃖(x) = R⃖(y) ⇔ (R⃖(x) ⊂ R⃖(y) ∧ R⃖(y) ⊂ R⃖(x)) ⇔ (xRy ∧ yRx)

so that (R⃖) is injective if and only if R is an order.
The idea of considering only one order written ≤ on each set E, may be formalized by replacing the ordered set (E,R) by (Im(R⃖), ⊂), though this does not exactly fit the following convention : any subset of an ordered set will be implicitly also ordered by restriction.

Strict and total orders

A strict order is a binary relation both transitive and irreflexive; and thus also antisymmetric.

Strict orders < bijectively correspond to orders ≤ by

x < y ⇔ (x ≤ y ∧ x ≠ y).
x ≤ y ⇔ (x < y ∨ x = y).

A total order on a set E is an order where all elements are comparable (∀x,y∈E, x ≤ y ∨ y ≤ x), i.e. (≤ ∪^t≤) = E×E.
Equivalent definitions:

an order ≤ related with its strict order < by ∀x,y∈E, x < y ⇎ y ≤ x.
a transitive relation ≤ such that ∀x,y∈E, x ≤ y ⇔ (y ≤ x ⇒ x=y).

A strict total order on a set E is a strict order associated with a total order, i.e. any transitive relation < such that, equivalently

∀x,y∈E, x < y ⇎ (y < x ∨ x = y)
∀x,y∈E, x = y ⇎ (y < x ∨ x < y)

Equivalence relations

An equivalence relation is a symmetric preorder.
On any set E, the equality relation 𝛿_E is an equivalence relation (as already expressed by the axioms of equality), and the only relation altogether symmetric, antisymmetric and reflexive.
On any product of sets, the product of equality relations is also equality.
The preimage of equality by any f : E → F is an equivalence relation on E we shall write ∼_f :

∼_f = {(x,y)∈E | f(x) = f(y)} = ∐(f_•⚬f)
Inj f ⇔ ∼_f = 𝛿_E

Conversely, any equivalence relation R is so definable from itself : R = ∼_R⃗. Indeed

∀x,y∈E, x R y ⇔ R⃗(y) ⊂ R⃗(x) ⇔ R⃗(x) ⊂ R⃗(y) ⇔ R⃗(x) = R⃗(y)

Quotient functions

Proposition. ∀_SetE,F,G, ∀f, (f : E ↠ F) ⇒ (G^F ∋ g ↦ g⚬f) : G^F ↔ {h∈G^E | ∼_f ⊂ ∼_h}.

Proof : ∀...∀f∈F^E, ∀h∈G^E, H = Im(f×h) ⇒

∼_f ⊂ ∼_h ⇔ (∀x,x'∈E, f(x) = f(x′) ⇒ h(x) = h(x′)) ⇔ (∀(y,z), (y′,z′) ∈ H, y=y′ ⇒ z=z′)
Dom H = Im f ∧ ∀g∈G^F, h = g⚬f ⇔ H ⊂ Gr g. ∎

For any functions f,h such that Dom f = Dom h ∧ ∼_f ⊂ ∼_h, this unique g : Im f ↠ Im h such that h = g⚬f will be called the quotient of h by f:

h/f = (Im f ∋y↦ ℩h[f_•(y)])
Gr (h/f) = Im (f×h)
Inj f ⇒ (f^-1 = Id_{Dom f}/f ∧ h/f = h⚬f^-1)
∼_f = ∼_h ⇔ Inj(h/f) ⇒ (h/f)^-1 = f/h

In the more general case Im f ⊂ Dom g (without surjectivity of f),

h = g⚬f ⇒ (∼_f ⊂ ∼_h ∧ h/f = g_{|Im f})
⇒ (∼_f = ∼_h ⇒ f = (g_{|Im f})⁻¹⚬h)

The formula that ∼_f is an equivalence relation,

∀x,y ∈ Dom f, x ∼_f y ⇔ ∼⃗_f(x) = ∼⃗_f(y)

where (∼⃗_f) = f_•⚬f, reflects the injectivity of (f_•) which can be directly seen as

∀y∈Im f, ∀z, f_•(y) = f_•(z) ⇒ f_•(y) ∩ f_•(z) = f_•(y) ≠ ∅ ⇒ y = z.

Partitions

A partition of a set E is a set of nonempty, pairwise disjoint sets, whose union is E.
In other words, it is a set P of subsets of E such that Id_P is an indexed partition of E.
The image of any indexed partition is a partition of the same set: for any function f with image E, Im(f_•) = f_•[E] = Im(f_•⚬f) is a partition of E.
∀R ⊂ E×E, if P = R⃗[E] then

(∀x,y∈E, x∈R⃗(y) ⇔ R⃗(x) = R⃗(y)) ⇔ (∀x∈E, ∀A∈P, x∈A ⇔ R⃗(x) = A) ⇔ Id_P = (R⃗_E_•)

Thus if R is an equivalence relation then P is a partition.
Inversely, any partition P of E defines a function g and then a graph R by

Dom g = E ∧ (g_•) = Id_P ∴ P = Dom (g_•) = Im g
R = ∐g ⊂ E×E ∴ g = R⃗_E ∴ (P = R⃗[E] ∧ Id_P = (R⃗_E_•))

thus R is an equivalence relation on E and

∀x,y∈E, xRy ⇔ y∈ ℩{A∈P | x∈A} ⇔ (∃A∈P, x∈A ∧ y∈A) ⇔ (∀A∈P, x∈A ⇒ y∈A).

For any equivalence relation R on E, the partition R⃗[E] = Im (R⃗) is called the quotient of E by R, and written E/R, so that

(R⃗) : E ↠ E/R.

For any x ∈ E, the set R⃗(x) = ℩{A∈E/R | x∈A} is called the class of x by R.
For any function f such that Dom f = E ∧ R ⊂ ∼_f, let us abbreviate f/(R⃗) as f/R:

f/R : E/R ↠ Im f
Inj f/R ⇔ R = ∼_f

So when R = ∼_f the role of E/R may be played by Im f.
In particular, any preorder R on a set E gives an equivalence relation R∩^tR, and the role of E/(R∩^tR) may be played by Im (R⃖).

2.10. Axiom of choice

Properties of curried composition

Proposition. ∀_Set E,F,G, ∀f ∈ F^E,

Im f = F ⇒ Inj(G^F ∋ g ↦ g⚬f)
(Inj(G^F ∋ g ↦ g⚬f) ∧ ∃^≥2:G) ⇒ Im f = F
(Inj f ∧ G ≠ ∅) ⇒ {g⚬f | g∈G^F} = G^E
({g⚬f | g∈G^F} = G^E ∧ ∃^≥2:G) ⇒ Inj f
(E ⊂ F ∧ G ≠ ∅) ⇒ {g_|E | g∈G^F} = G^E
(Inj f ∧ E ≠ ∅) ⇒ ∃g∈E^F, g⚬f = Id_E

Proofs :
1. done in 2.9 ; simpler proof: ∀g,h∈G^F, (∀x∈E, g(f(x)) = h(f(x))) ⇔ (∀y∈F, g(y) = h(y)) ⇔ g=h
2. ( ) ⇒ ∃z≠z′∈G, (F∋y ↦ z)⚬f = (F∋y ↦ (y ∈ Im f ? z : z′))⚬f ∴ (∀y∈F, y ∈ Im f ∨ z = z′) ∴ Im f = F
3. Inj f ⇒ ∀z∈G, ∀h∈G^E, h = (F ∋ y ↦ (y ∈ Im f ? h(f^-1(y)) : z))⚬f
4. ∀z≠z′∈G, ∀x∈E, ∀g∈G^F, g⚬f = (E∋y↦ (y = x ? z : z′)) ⇒ ∀y∈E, f(y) = f(x) ⇒ g(f(y)) = g(f(x)) = z ⇒ y = x.
5. and 6. are particular cases of 3.∎

The case G = V₂ gives the following properties of h = (℘(F)∋B↦ f_⋆(B)) :

Im f = F ⇔ Inj h
Inj f ⇔ Im h = ℘(E)

Moreover {f[A]|A ⊂ E} = ℘(Im f) because ∀B ⊂ Im f, f[f_⋆(B)] = B.

Proposition. For any sets E, F and any function g with domain F,

Inj g ⇒ Inj(F^E ∋ f ↦ g⚬f)
(Inj(F^E ∋ f ↦ g⚬f) ∧ E ≠ ∅) ⇒ Inj g
∀_SetG, ({g⚬f | f∈F^E} = G^E ∧ E ≠ ∅) ⇒ Im g = G.

Proofs:
1. ∀f,f'∈F^E, (Inj g ∧ ∀x∈E, g(f(x)) = g(f'(x))) ⇒ (∀x∈E, f(x) = f'(x)).
2. ∀y,z∈F, g(y) = g(z) ⇒ g⚬(E ∋ x ↦ y) = g⚬(E ∋ x ↦ z) ⇒ (∀x∈E, y = z) ⇒ (E ≠ ∅ ⇒ y = z).
3. (left as an exercise to the reader).∎

Axiom of choice over a set (AC_X)

The axiom of choice (AC) is a statement expressible in any set theory with powerset, which «feels true» and is convenient to accept as an axiom; but we will actually not need it in the rest of this work.

It has several equivalent formulations (in clear, it is the common name of these equivalent statements). The main ones fit the common format (∀_SetX, AC_X) where AC_X means any of the following equivalent formulas with the free variable X in the class of sets:

(AC1_X) ∀_SetE, ∀R⊂X×E, (∀x∈X, ∃y∈E, xRy) ⇒ (∃f∈ E^X, ∀x∈X, xRf(x)).
(AC2_X) ∀A : X → Set, (∀x∈X, A_x ≠ ∅) ⇒ ∏A ≠ ∅
(The product of any family of nonempty sets indexed by X is nonempty)
(AC3_X) ∀_Fnc g, X ⊂ Im g ⇒ ∃f ∈ (Dom g)^X, g⚬f = Id_X
(AC4_X) ∀_grR, X ⊂ Dom R ⇒ ∃f ∈ (Im R)^X, Gr f ⊂ R.

(AC3_X) and (AC4_X) are more often used in cases of equality (respectively X = Im g and X = Dom R) which are equivalent to the general cases, as the proofs of equivalence work the same :

(AC1_X)⇒(AC2_X) by R = ∐ A ∧ E = Im R ;
(AC2_X)⇒(AC3_X) by A = (g_•(x))_x∈X ;
(AC3_X)⇒(AC4_X) : X ⊂ π₀[R] ⇒ ∃(h×f) ∈ R^X, h = Id_X ∴ Gr f = Im (h×f) ⊂ R ;
(AC4_X)⇒(AC1_X) obvious.∎

Other easy deductions are

(AC2_X)⇒(AC1_X) by A_x = R⃗(x) ;
(AC4_X)⇒(AC3_X) by R = ^tGr g.

Dependencies between diverse AC_X

A big achievement of mathematical logic was to prove the independence of AC from the rest of set theory: even the very strong ZF system, if consistent, can neither prove nor refute it, as each universe where AC is true includes another one where it is false, and vice versa. This will be further commented in 5.5 (without the details of the proofs which are much too difficult).
The diverse ways in which AC may fail in different universes, can be partly sorted out by specifying for which X does AC_X hold. The different AC_X depend on each other as follows:

∀_SetX,Y, (AC_X ∧ ∃f, f : Y ↪ X) ⇒ AC_Y
∀_SetX,Y, (AC_X ∧ AC_Y) ⇒ (AC_X∪Y ∧ AC_X×Y)
(AC_I ∧ ∀i∈I, AC_{A_i}) ⇒ AC_{∐_i∈I A_i}

The proofs of these implications are easy and left as an exercise to the reader.

Finite choice theorem. If X is finite then AC_X.

This will be easily deducible from the case of singletons AC_{x} (which is trivial) and the preservation by binary union (AC_X ∧ AC_Y) ⇒ AC_X∪Y once finiteness will be formalized (4.6).
Now let us just suggest a schema of proofs, namely one for each number of elements of X. For example the case of X with 3 elements comes down to (AC2_V₃) whose proof can be written

∀_SetX,Y,Z, (X ≠ ∅ ∧ Y ≠ ∅ ∧ Z ≠ ∅) ⇒ (∃x∈X, ∃y∈Y, ∃z∈Z, (x,y,z) ∈ X×Y×Z) ⇒ X×Y×Z ≠ ∅.∎

More statements simply equivalent to AC

Theorem. The following statements are equivalent to the axiom of choice:
(AC5) Any set E of nonempty sets has a choice function: ∅ ∉ E ⊂ Set ⇒ ∏ Id_E ≠ ∅.
(AC6) For any partition P of a set E, ∃K⊂E, ∀A∈P, ∃!: K∩A.
(AC7) For any sets E,F,G and any g: F ↠ G, {g⚬f | f ∈ F^E} = G^E.

Proofs:
(AC2)⇒(AC5) is obvious ;
(AC5)⇒(AC6) : (x∈ ∏ Id_P ∧ K = Im x) ⇒ (K⊂E ∧ ∀A∈P, x_A ∈ K∩A ∧ ∀B∈P, x_B ∈ A ⇒ A∩B ≠ ∅ ⇒ A = B)
(AC6)⇒(AC3) : ∀_Fnc g, X ⊂ Im g ⇒ ∃K⊂Dom g, (∀A∈ Im(g_•), ∃! : K∩A) ∴ ∃f ∈ (Dom g)^X, (∀x∈X, {f(x)} = K∩g_•(x)) ∴ g⚬f = Id_X.
(AC1_E)⇒(AC7) : ∀h∈G^E, (∀x∈E, ∃y∈F, g(y) = h(x)) ⇒ (∃f∈F^E, ∀x∈E, g(f(x)) = h(x))
(AC3_G)⇒(AC7) : ∃j∈F^G, g⚬j = Id_G ∴ ∀h∈G^E, j⚬h ∈ F^E ∧ g⚬j⚬h = h.
(AC7)⇒(AC3) : ∀_SetX, ∀_Fncg, X = Im g ⇒ Id_X ∈ X^X = {g⚬f | f ∈ (Dom g)^X}. ∎

Note: (AC5)⇒(AC2) is also easy : ∅ ∉ Im A ⊂ Set ⇒ ∃f ∈ ∏ Id_{Im A}, f⚬A ∈ ∏ A.

There are many more statements equivalent to AC, which will not be of any use in this work, and for which the proofs of equivalence are much harder. Interested readers can find about them elsewhere.

2.A. Time in set theory

Standard universes

Given two universes U₁, U₂, with U₁ ⊂ U₂, we shall define 3 distinct versions for the concept of U₁ being standard in U₂, also called a sub-universe of U₂. We first informally introduced such a concept in 1.7 as the preservation of the roles of sets and functions; then in 1.B with the other meaning (STF) of standardness of ℕ, i.e. the interpretation of finiteness (applicable to any universe as mentioned in 1.D).

The general idea is that the structures of U₁ (the considered interpretation of symbols of set theory in U₁ to make it a universe) must be the restrictions of those used for U₂. But it depends which structures we mean, and how we define their restrictions. From the property of preservation by restriction from U₂ to U₁ we shall specify for each version, more preservation properties on other symbols implicitly follow, thanks to the axioms of set theory in U₂. Any structure defined using only symbols with preserved meaning, is also preserved. The structures not preserved are anyway determined in U₁ as defined from those of U₂ with parameter U₁ (as a set or a unary predicate). This is why U₁ only needs to be specified as a meta-subset or class in U₂, keeping its structures implicit.

Let us focus on specifying what each version of standardness requires on sets (to which the diversity of versions is reducible, as the conditions on functions follow in the way essentially given by the fate of their graphs as sets, after naturally defining standardness for ordered pairs). The first structure for them is their class Set. The condition for this is that Set₁ = Set₂ ∩ U₁. Yet, arguing that Set is not really a structure but only a source of definiteness classes for effective structures, this condition may be weakened as Set₁ ⊂ Set₂ ∩ U₁ without essentially affecting the main 3 degrees of standardness described below.

The weak version (coming when seeing the role of sets as given by ∈) is

(ST) : ∈₁ = ∈₂ ∩ (U₁× Set₁).

Actually, this condition will suffice to imply (STF) once adopted the axiom of foundation for U₂ and further reasonable conditions which will be detailed later. Otherwise we need to add it as another requirement in the concept of standardness, in the form ℕ₁ = ℕ₂ (if the axiom of infinity is adopted in both) or anyway in terms of the class Fin of finite sets as Fin₁ ⊂ Fin₂ which is equivalent to Fin₁ = Fin₂ ∩ Set₁ (for reasons a bit subtle).

The intermediate version (mentioned in 1.7: every set coincides with the meta-set of the same elements) is

(ST') : ∈₁ = ∈₂ ∩ (U₂× Set₁) ⊂ U₁× Set₁

i.e. every set in U₁ is interpreted by U₂ as included in U₁. This implies the preservation of bounded quantifiers (if the sub-formula keeps its values, namely does not involve ℘) and of any operator giving a set determined by the data of its elements, namely the operators given by the set generation principle. Special constructions exist (such as those of internal set theory) of sub-universes not fitting this condition and still preserving bounded quantifiers on formulas with parameters in U₁; but the equivalence holds with the faithfulness of bounded quantifiers on formulas with parameters in U₂ (as we shall see when justifying the set generation principle in 2.B).

The strongest version of standardness also requires the preservation of the powerset symbol ℘ (2.7):

(STP) : ℘₁ = ℘_2|Set₁

This additional condition can also be written without explicit use of ℘ as

(ST") : (ST) ∧ ∀E∈Set₁, ∀F∈Set₂, F ⊂₂ E ⇒ F ∈ Set₁

These conditions are related by (ST") ⇔ ((ST') ∧ (STP)) ⇒ (STF).
Indeed, (STF) comes by the fact finiteness is definable from ℘ (independently of any additional assumption which was needed with the mere (ST)) ; this can also be seen considering that arithmetic is specified as a second-order theory, which can be faithfully interpreted by set theory with ℘.
The independence of the axiom of choice (further commented in 5.5) and other statements involving ℘, can be roughly explained by the possibility to construct universes which do not preserve the essential meaning of ℘. Following convenience, one such construction happens to fit (ST'); others may fit (ST) ∧ (STP) but not (ST') while (ST')-standard copies of their results also exist.

Ideally standard universes

For any 3 universes U₁ ⊂ U₂ ⊂ U₃ where U₂ is standard in U₃,

(U₁ is standard in U₂) ⇔ (U₁ is standard in U₃).

Thus the idea to consider the standardness of U₁ as an absolute property, independent of the external universe in which it is described... provided that this external universe is itself standard. This does not define standardness as an absolute concept (which cannot be defined anyway), but suggests that such a concept may make ideal sense.

A sub-universe U₁ of U₂ will be qualified as small in U₂ if it is known as a set there. This can be written U₁ ∈ Set₂ if U₂ is also itself (ST')-standard, otherwise ∃X∈Set₂, U₁ = ∈⃖₂(X).

To complete the concept of ideally standard universe, let us postulate that every two standard universes are small sub-universes of a third one.

This ideal is quite reasonable for (ST) and (ST'); but for (ST") it is somewhat more daring. As a first reason, it can be philosophically argued that any universe (standard or not) can be taken as a small sub-universe of another in the sense (ST'), but not always in the sense (ST") because seeing it as a set may provide means to define more subsets of given sets. In particular, any (ST")-standard universe being a small (ST") sub-universe of another, implies its fulfillment of the specification schema.

The realistic view of set theory

From now on we shall assume a fixed version of "standardness" meant as either (ST') with a set theory without powerset, or (ST") with a set theory with powerset. In either case, all bounded expressions keep the same values from a standard universe to another, thus called their standard values.

The above concepts provide a realistic meaning for set theory :

Generic theories, seen axiomatically, are meant to be interpreted in a fixed model containing all values of variables and expressions;
But set theory aims to "locally" interpret in the standard way any expression with fixed values of its free variables, independently of the choice of a standard universe, containing these values, where this interpretation may be processed.

In the next section (2.B) the meaningfulness of the (ST") ideal will appear philosophically equivalent to the realism of admitting ℘ in set theory.

Let us formalize some aspects of the plurality of standard universes which set theory aims to describe in its realistic view.

Standard multiverses (expansion of the set theoretical universe)

Let us call quasi-standard multiverse (resp. standard multiverse) any collection (range) of universes (resp. standard universes), where any 2 of them are small sub-universes of a third one. Let us require it to also contain all sub-universes of its members, a quality easy to fulfill, taking a multiverse where it did not hold and adding all these sub-universes to it. We shall say that the set theoretical universe expands when it ranges over a standard multiverse.

For any quasi-standard multiverse M, its union U = ⋃M forms another universe containing all members of M as small sub-universes. Then, for any expression with any values of its free variables in U there exists some U∈M containing these values and thus able to interpret the expression; this interpretation being independent of the chosen U, defines that of U. Therefore,

(M is a standard multiverse) ⇔ (the universe U is standard).

But U cannot belong to M, because M has no greatest element. So, no single standard multiverse can ever contain all possible standard universes.
For a (ST")-quasi-standard multiverse, keeping in U the same powerset of a given set as in member universes, may lead to the failure of the specification schema in U. Here comes another side of the postulate attached to the ideal (ST")-standardness : that with (ST")-standard multiverses, this failure never happens.

Can a set contain itself ?

Let us call a set reflexive if it contains itself.
The proof of Russell's paradox [1.8], shows that the class of non-reflexive sets (Set(x) ∧ x ∉ x) cannot be a set (as it is not included in any set). So, many sets are non-reflexive ; but can reflexive sets exist ? this is undecidable; here is why.

From any universe U with a nonempty class X of reflexive sets, another universe containing none can be made by 2 different methods

Take U\X as class of objects, and turn into urelements all sets E which contained a reflexive set (∃x∈E, x∈x), and all functions whose domain or image was such an E. (These sets cannot be rebuilt from the data of their elements, since each one has an element removed from the universe, namely itself ; any class of urelements can also be eliminated in the same way, turning other objects into urelements).
Progressively rebuild a universe from scratch: this avoids all reflexive sets. Namely, from a given universe, first accept its urelements, then also accept, progressively along an endless "time", every set only made of previously accepted objects, and every function only involving such objects. Each set so accepted at some time, is formed as a collection of previously accepted objects. At the first time a set is so accepted as a collection of previously available objects, it could not be available yet as a possible element, thus it cannot be reflexive.
As the reflexivity of a set is independent of context, a union of universes each devoid of reflexive sets, cannot contain one either.

On the other hand, universes with reflexive sets can be produced as follows:

Riddle. What is the difference between

a universe with an urelement x and a non-reflexive set y such that x ∈ y,
and a universe with a reflexive set x and an urelement y such that y ∉ x ?

Answer (click to show) :

The role of the set containing x but not y, played by y in the former universe, is played by x in the latter.

The axiom of foundation, introduced in 5.3 (well-foundedness), will imply the absence of reflexive sets as a particular case ; its undecidability can be proven in ways essentially similar to the above arguments. But for most purposes, this axiom is as useless as the sets it excludes.

2.B. Interpretation of classes

The relative sense of open quantifiers

As the set theoretical universe expands, the values of decidable statements (under the given axioms) must of course stay constant. However, any undecidable statement, as well as any other formula containing open quantifiers and given with fixed values of its free variables (let us call these "open formulas"), may change value from one universe to another.
But if any value of an open formula can only be "the current one" relative to the choice of this universe, the question whether it is constant or variable, that is how things go «elsewhere», is relative to a choice of multiverse. But the realistic view of set theory means to neither assume a fixed universe nor a fixed multiverse. Thus, the natural way for it is to abstain from giving any value to open formulas (beyond decidable statements with known proof or refutation). Indeed, tries to do better would be rather hopeless due to how messy the situation is. Let us review its diverse aspects.

In a multiverse of «all universes» (standard or not) fitting a given consistent axiomatic set theory, the situation was explained in Part 1 :

The variability of value of a statement there, is equivalent to its undecidability (completeness theorem);
However, this undecidability, when true, cannot anyway be proven by the same axiomatic set theory (incompleteness theorem).

The realistic view refering to standard universes, leads to another messy situation, first by lack of formal definition for the ideal concept of standardness.
Trying to materialize the concept of "real indefiniteness" of statements as their variability in a standard multiverse M, the behavior of this M is determined by its union U, as M is the range of all small sub-universes of U; this allows to express relevant questions as statements in U.

For example, the variability in M of an existential statement (∃x, A(x) where A is a bounded formula) means there exists in M a universe U where it is false (∀x∈U, ¬A(x)), and another universe U' where it is true (∃x∈U', A(x)). But then it is also true in any standard universe containing both U and U', among which some other members of M, and U itself. Such a statement may be called «ultimately true».

But since some statements may vary between acceptable standard universes U, some of these may still vary between the U of acceptable standard multiverses. Indeed, U is just another universe which can be axiomatically described, but only incompletely, for both formal reasons (the incompleteness theorem) and realistic ones (there is no ultimate standard multiverse). Just more statements can be made decidable for U than for U by describing U with a stronger theory T than the theory T given to describe U. Such a stronger T can already be obtained by formalizing this description of U as the union of a quasi-standard multiverse whose members satisfy T (while the standardness of U escapes formalization).
Behind the undecidability of a statement (existential or other), multiple situations may occur. It might (I guess) be always true in one quasi-standard multiverse, and always false in another, while, depending on the statement, either of them (but not both) may be standard. It may vary or not vary between standard universes, but this variability, itself expressible as a statement in the union of a standard multiverse, may be itself undecidable, and eventually vary between standard multiverses.

The undecidability of some statements such as AC, usually reflects the variability, which may also happen among (ST')-standard universes, of the values of the open quantifiers used in these statements when re-written without ℘. In particular, a quantifier ∀x∈℘(E) appears as an open quantifier on the class of subsets of E, while other uses of ℘ either also involve this once translated, or cannot even be translated when the powerset axiom does not hold.

The indefiniteness of classes

Unlike objects which can be compared in formulas by the = symbol, the meta-relation of equality between classes is as indefinite as the open ∀ since both concepts are definable from each other :

The equality of classes A = B would be defined by ∀x, A(x) ⇔ B(x);
The statement (∀x, A(x)) means A = 1 (the universe, class of all objects).

This indefiniteness can be understood by remembering that, depite being usable as ranges, classes are only meta-objects, basically given as predicates, i.e. synctactically (as a formula with parameters). Like with open quantifiers, this leaves us with both concepts of provable equality (or proven equality), and provable inequality, according to the status (provable or refutable) of this statement of "equality" of classes (∀x, A(x) ⇔ B(x)).

Each universe U interprets each class C as a meta-set of objects P = {x∈U | C(x)}, and sees it as a set when P ∈ U. This condition is expressible by set theory in U as a statement S(C), written equivalently as

∃P, ∀x, C(x) ⇔ x ∈ P
∃E, ∀x, C(x) ⇒ x ∈ E

The equivalence is because P can be defined from such an E by P = {x∈E | C(x)}.
Having two open quantifiers, it is "more indefinite" than the equality between classes.

Classes and sets in expanding universes

Aside this formulation S(C) in a fixed universe, let us analyze this distinction of sets among classes (and thus what makes other classes indefinite), from the perspective of a quasi-standard multiverse. There, this concept has 3 possible versions, ordered only approximately by implications which "often work" but have exceptions.

The "weak" version is to claim S(C) true in all members of this multiverse. That means C always coincides with some set P, but this may be a different P from one universe to another.
The "strong" version requires P to stay constant. This can be seen as the meaning behind the formal choice of naming P by a symbol of set theory, as was discussed with the powerset (2.7). But just the constancy of P = {x∈U | C(x)} when U varies, needs not imply the above weak version, as there can exist some universe U (among the smallest of this multiverse) such that P ∉ U. This means P is arising to existence with U, and will anyway exist as a set in all the "future" (larger) universes which see U as a set. So, a class C is not considered a set in this "strong" sense, if it remains able to contain «unknown» or «not yet existing» objects (in some future universe), that would only belong to some larger future value of P, which can thus vary with the growth of U.
The middle version (implied by the strong one) is to interpret S(C) in the union U of the multiverse. Deciphered there, it says that, while U expands, «there exists a time from which P stays constant» (and thus after which it will exist as a set). So, this just differs from the strong version by ignoring any "past" variations (which could occur during some "beginning" of the expansion), to focus on the ultimate behavior (among the latest, i.e. largest universes, somehow "closest" to the U where S(C) is interpreted).

But the ultimate realistic meaning of set theory is to refer to the range of "all standard universes", which differs from the concept of standard multiverse in that this range is not a set but a class. So, the concept of union cannot apply to this range, and yet its impossible result would form the ideal interpretation of set theory. This ideal can be well approximated by interpretations of set theory by either fixed universes or quasi-standard multiverses (whose existence is ensured by the completeness theorem). Both perspectives, in terms of a fixed vs. a variable universe, alternatively transcend each other endlessly along its expansion.

This explains how any intended set theory can be formalized by mere first-order axioms: if any intended property of the universe was only expressible by a second-order statement, or if anyhow its expression involved external objects (regarding this universe as a set), then it could be re-expressed by moving the framework, as the stronger first-order axiom of existence of a sub-universe of this kind, and why not also endlessly many of them, forming a standard multiverse (stating that every object is contained in such a sub-universe).

Justifying the set generation principle

Let B be a quantifier defined by a bounded formula whose parameters take values in a sub-universe U of any other universe we might work in. Let E be the range of all values taken by the argument y of A when interpreting (By, A(y)). It is independent of A and included in U insofar as A is absent from any expression of y in the defining formula. Typical exceptions would be when B involves a formula of the form A(t({x∈F|A(x)})) for some term t, in which case defining E as the set of possible values of y would require to assume ℘ and the (ST")-standardness of U. Without these assumptions, such exceptional formulas need to be excluded from the definitions of B for the justifications to work. Anyway they are excluded by understanding, as explained in 2.A-2.B, the set-builder as the notation for the operator symbols from an infinite list, one for each formula using the language of set theory to which A does not belong. (The set-builder is needed to put a formula inside a term, unless the conditional operator is accepted as primitive, which is acceptable, but then, function definers using A by the conditional operator are excluded for the same reason)

Let C(x) defined as (By, y = x). By the hypothesis of the set generation principle for the B which was written Q*, we have a proof of (B ⇔ ∃_C). This implies ¬(By, 0), and, coming by second-order universal introduction, remains valid in any universe where it is interpreted.

For any x, the value C(x) of B on the predicate (y ↦ (y = x)), can only differ (be true) from its (false) value on (y ↦ 0), if both predicates differ inside E, i.e. if x belongs to E :

C(x) ⇔ ((By, y = x) ⇎ (By, 0)) ⇒ (∃y∈E, y = x ⇎ 0) ⇔ x ∈ E

This inclusion of C in E shows it is essentially a set:

C is a set in the above "strong" sense (C ⊂ U) ;
Moreover E is a set in the sense of first-order logic (1.D): the formula of B only has definite, fixed means (variables bound to given sets, fixed parameters) to provide its elements. ∎

For any class satisfying the condition of the set generation principle (being indirectly as usable as a set in the role of domain of quantifiers), is it also indirectly as usable as a set in the role of domains of functions (before using this principle) ? Namely, is there for each such class a fixed formalization (bounded formulas with limited complexity) playing the roles of the definer and the evaluator for functions having this class as domains ? The answer would be yes, but we shall not develop the justifications here.

Concrete examples

A set: Is there any dodo left on Mauritius ? As this island is well known and regularly visited since their supposed disappearance, no surviving dodos could still have gone unnoticed, wherever they may hide. Having not found any, we can conclude there are none. This question, expressed by a bounded quantifier, has an effective sense and an observable answer.

A set resembling a class: Bertrand Russell raised this argument about theology: «If I were to suggest that between the Earth and Mars there is a china teapot revolving about the sun..., nobody would be able to disprove my assertion [as] the teapot is too small to be revealed even by our most powerful telescopes. But if I were to go on to say that, since my assertion cannot be disproved, it is intolerable presumption on the part of human reason to doubt it, I should rightly be thought to be talking nonsense.» The question is clear, but on a too large space, making the answer practically inaccessible. (An 8m telescope has a resolution power of 0.1 arcsec, that is 200m on the moon surface)

A class: the extended statement, «there is a teapot orbiting some star in the universe» loses all meaning: not only the size of the universe is unknown, but Relativity theory sees the remote events from which we did not receive light yet, as not having really happened for us yet either.

A meta-object: how could God «exist», if He is a meta-object, while «existence» can only qualify objects? Did apologists properly conceive their own thesis on God's «existence» ? But what are the objects of their faith and their worship ? Each monotheism rightly accuses each other of only worshiping objects (sin of idolatry): books, stories, beliefs, teachings, ideas, attitudes, feelings, places, events, miracles, healings, errors, sufferings, diseases, accidents, natural disasters (declared God's Will), hardly more subtle than old statues, not seriously checking (by fear of God) any hints of their supposed divinity.

A universal event: the redemptive sacrifice of the Son of God. Whether it would have been theologically equivalent for it to have taken place not on Earth but in another galaxy or in God's plans for the Earth in year 3,456, remains unclear.

Another set reduced to a class... The class F of girls remains incompletely represented by sets: the set of those present at that place and day, those using this dating site and whose parameters meet such and such criteria, etc. Consider the predicates B of beauty in my taste, and C of suitability of a relationship with me. When I try to explain that «I can hardly find any pretty girl in my taste (and they are often unavailable anyway)», i.e.

(∀_F x, C(x) ⇒ B(x)) ∧ {x∈ F | B(x)}≈Ø,

the common reaction is: «Do you think that beauty is the only thing that matters ?», i.e.

What,(∀x ∈ F, C(x) ⇔ B(x)) ????

then «If you find a pretty girl but stupid or with bad character, what will you do ?». Formally : (∃x ∈ F, B(x) ⇏ C(x) !!!). And to conclude with a claim of pure goodness: «I am sure you will find !», that is (∃ plenty of x ∈ F, C(x)). Not forgetting the necessary condition to achieve this: «You must change your way of thinking».
... by the absence of God...: F would have immediately turned into a set by the existence of anybody on Earth able to receive a message from God, as He would obviously have used this chance to make him email me the address my future wife (or the other way round).
... and of any substitute: a free, open and efficient online dating system, as would be included in my project trust-forum.net, could give the same result. But this requires finding programmers willing to implement it. But the class of programmers is not a set either, especially as the purpose of the project would conflict with the religious moral priority of saving God's job from competitors so as to preserve His salary of praise.

2.C. Concepts of truth in mathematics

Let us recall and expand on the different concepts of «truth» we saw for mathematical statements, from the simplest to the most subtle.

The most basic was the relative truth (further analyzed in 1.B), that is the Boolean value of any statement expressed in any given theory, depending on the system (model of the theory) where it is interpreted; this system is supposedly fixed like an implicit free variable. More generally, any formula takes a Boolean value for fixed values of its free variables in that system.

Then comes provability in any given first-order axiomatic theory ; this coincides with the quality of being relatively true in all its models.

Finally, come the truths in two realistic theories : arithmetic and set theory. Let us review them in more details.

Arithmetic truths

As explained in 1.C, the concept of provability in any explicit first-order theory, and more generally any effective "provability" concept (predicate) over the class of statements of any conceivable theory, must be a FOT-existential predicate, equivalent (through some encoding) with some existential predicate of arithmetic. But we need to refer to the realistic truth of arithmetic to interpret these. This is an ideal but clear concept (independent of any ontological hypothesis about infinities), meaning the class of properties of "the standard model ℕ of arithmetic" (to say in short, as all standard models of arithmetic are identical copies of each other).
Indeed all standard objects of a FOT being finite can be accepted as real (no matter how big), while their distinction with non-standard objects (pseudo-finite ones) is also real despite being not formalizable.
The meaningfulness of statements can be understood first for bounded formulas, then those with 1 open quantifier, such as provability or consistency statements, whose objects are mere finite systems (proofs). And so on for each additional open quantifier over a previously accepted formula, even if their meaningfulness becomes more and more subtle.

Yet the incompleteness theorem showed that this realistic truth predicate of arithmetic is not itself an existential predicate. More precisely, even the negation of the provability predicate of any explicit theory able to express arithmetic, cannot be itself an existential predicate.

Still we need partial solutions: existential classes of statements of arithmetic with both qualities

FOT-sound (included in the class of realistic truths);
Large (compared by inclusion with other such existential classes).

The natural way to progress in the endless and non-algorithmic quest for larger and larger such classes without breaking FOT-soundness, consists in searching for stronger and stronger axiomatic set theories. Let us comment this further.

More concepts of strength

Any axiomatic set theory (or other foundational theory) T is reflected by 2 classes of arithmetical statements :

The class of its arithmetical theorems ;
The statement of its consistency.

Both should not be confused :

1. cannot contain 2. (unless 2. is false), though 2. must be true if 1. is valid. This is because 1. is only made of the theorems of T with a standard proof, while 2. means to forbid the current universe from containing even a non-standard proof (with pseudo-finite length) from T that would lead to contradiction.
So, 2. means the existence, in the current universe, of some model of T but with no requirement of standardness ; while 1. means the current universe agrees with T on arithmetic, and thus can only be philosophically justified by the idea that T is valid in the sense of having a FOT-standard model.

They lead to two other conceptions (definitions) of a "strength" preorder between foundational theories, most often equivalent to our first one of 1.A (possible cases of non-equivalence will not be considered here) :

The inclusion order between their classes of arithmetical theorems ;
The deducibility (provable implication) between consistency statements: T' is stronger than T if the consistency of T is deducible from the consistency of T'; then T' is "strictly stronger" than T if the consistency of T is a theorem of T'.

(1. would be related to the implication order between statements of FOT-soundness, except that the expression of such statements requires a framework strong enough to express arithmetical truth, such as MT).

If T' can prove the existence of a standard model of T then T' is strictly stronger than T in both ways.

Set theory from realism to axiomatization

For an axiomatic set theory to give a class of arithmetical theorems both FOT-sound and large, it needs to be "not wrong and yet very good" :

Sound = keeps accepting some standard universes (for FOT-soundness);
Strong = only accepts "large" universes (truly large in the standard case, otherwise anyway looking large by the description of their hierarchy of sub-universes), to contribute to a large class of theorems.

Among sound, well-described set theories (or other foundational theories) there can be no strongest one : from any of them T we can get stronger ones, at least arithmetically, in the following ways roughly ordered by increasing power (where T₀ is some possibly weaker theory but which fits the below "open" quality):

T + the consistency of T;
T₀ + the existence of a standard universe of T ;
T₀ + the existence of infinitely many standard universes of T forming a standard multiverse (actually only ensured to be quasi-standard).

And these are only the first of an open-ended range of possible methods with growing efficiency in their strengthening effect with respect to the added complexity of description. It actually turns out that the Replacement schema, used in ZF set theory, amounts to the use of a much more powerful strengthening method than these.

Arguments to justify any strong axiomatic set theory like this as a valid foundation of mathematics, must remain somewhat philosophical, and thus assessed in intuitive, not completely formalizable ways, since their point is to do better than any formal proof or other predefined algorithm: no fixed formal method can always witness for any strong theory when it is consistent, even less when it is FOT-sound (by truth undefinability), or sound (the idea that some given axiom which excludes one size of standard universe still accepts some larger ones).

The usefulness of strong axiomatic set theories for proving large classes of arithmetical truths (insofar as philosophical disputes on the ontological status of the ideals motivating these theories do not undermine the role of these ideals as reasons for the confidence in the FOT-soundness), can then be read as an indispensability argument for the reality of the universes so described, beyond the infinity of ℕ.
Indeed, it makes these theories "valuable", while their consistency together with the mere existence of ℕ, ensure the existence of models. Only non-standard models are so ensured to exist, but these work similarly to the standard ones which were ideally meant anyway. Hence our last concept of truth, which is the truth of set theoretical statements.

Axioms compatibility condition

Given several axiomatic set theories with both qualities (sound and strong), consider the theory obtained as their union (the union of their classes of axioms if they have the same symbols, for example if all only use ∈, like ZF). It will be stronger than each, but is it still sound ? Let us qualify a collection of set theories (or their axioms) as compatible if their union (conjunction) remains sound.
So, beyond the soundness and strength of axiomatic set theories, we need an additional quality requirement for their axioms, so that, without limiting the strength of sound set theories which can be so written, all such theories will be compatible. Actually this problem has a natural solution. This needed additional quality is another ideal concept, not completely formalizable, so let us first express it intuitively :

An axiom is open if satisfied by any open universe, i.e. where eternity is a very long time especially towards the end.

This is meant as more specific than the mere "eternity is a very long time" which rephrases the quality "large" for universes, and as deepening our previous concept of being open. Let us explain it in several steps.

First, no axiom should put any limit on the size of the universe, for compatibility with set theories stronger than this size (for example, a sound theory T accepting only one standard universe would be incompatible with the statement of existence of a standard universe of T).

Then, the remaining risk for several statements to be incompatible, is if the values of at least 2 of them endlessly vary (alternating truth and falsity) as the universe expands, in such a way that no standard universe "larger than some size" can fit all (their conjunction would limit the size of the universe). This is resolved by only accepting "open" axioms, which will all agree on "open" standard universes. Let us explain how.

Consider such a variable statement : written in prenex form it must be using both kinds of open quantifiers. Let us analyze the case of statements with only 2 open quantifiers: ∃x, ∀y, A(x,y), among which the statements S(C) that some class is a set. (From this, the case ∀x, ∃y is deduced by negation, while cases with more open quantifiers would be trickier and will not be discussed here). For any standard multiverse where it indeed endlessly varies, such a statement turns out to be false in its union. So, taking such a statement as an axiom would be inappropriate for this multiverse. Yet, what really matters is the behavior of the class of all standard universes (there is no systematic way to determine this, but it is our intended ideal). Now the point of a statement being "open", is that it reflects the truth in the union of a multiverse which behaves like the intended class of all standard universes.

Such "open" universes, with the same properties as the union of a standard multiverse behaving like the class of "all standard universes", can be intuitively described as "much larger than any smaller one" : there must be no limit to how bigger the universe is from any sub-universe (described by a weaker theory). This also avoids the inefficiency that could be involved in the opposite case, of an axiomatic system likely to be significantly more complex but only slightly stronger than another one.

The quest for axiomatic set theories approaching all 3 ideally conceived qualities (strong and open but still sound) aiming to select universes with the corresponding 3 qualities (large and open but still existing among standard ones), is endless. Fortunately, rather simple set theories such as ZF, already satisfy these qualities to a high degree, describing much larger realities than usually needed. This is how a Platonic view of set theory (seeing the universe of all mathematical objects as a fixed, exhaustive reality) can work as a good approximation, though it cannot be an exact, absolute fact.

Alternative logical frameworks

The description we made of the foundations of mathematics (first-order logic and set theory), is essentially just an equivalent clarified expression of (or other way to introduce) the same version of mathematics as widely accepted. Other logical frameworks already mentioned, to be developed later, are still in the "same family" of "classical mathematics". But other, more radically different frameworks (concepts of logic and/or sets), called non-classical logic, might be considered. Examples:

Some logicians developed the «intuitionistic logic», which lets formulas keep a possible indefiniteness as we mentioned for open quantifiers, but treated as a modification of the pure Boolean logic (the rejection of the excluded middle, where ¬(¬A) does not imply A), without any special mention of quantifiers as the source of this uncertainty. Or it might be seen as a formal confusion between truth and provability. In this framework, {0}∪ ]0,1] ⊂ [0,1], without equality. I could not personally find any interest in this formalism but only heard that theoretical computer scientists found it useful.
When studying measure theory (which mathematically defines probabilities in infinite sets), I was inspired to interpret its results as simpler statements on another concept of set, with the following intuitive property. Let x be a variable randomly taken in [0,1], by successively flipping a coin for each of its (infinity of) binary digits. Let E be the domain of x, set of all random numbers in [0,1]. It is nonempty because such random numbers can be produced. Now another similarly random number, with the same range (y ∈ E) but produced independently of x, does not have any chance to be equal to x. Thus, ∀x∈E, ∀y∈E, x ≠ y. This way, x ∈ E is no more always equivalent to ∃y∈E, x = y.

We will ignore such alternatives in the rest of this work.

Set Theory and Foundations of Mathematics 2. Set theory