Replica Method for Boolean Perceptron with Continuous Weight
An brief derivation for the asymptotic generalization error of a Boolean perceptron with continuous weights using the replica method. This problem was also the final assignment in the ‘Statistical Mechanics of Neural Networks’ course at SYSU, Fall 2024.
1. Question
Consider the asymptotic form of the generalization error of a Boolean perceptron with continuous weights in the large-
2. Setup
We consider the perceptron model
where the weight vector
The training set is
where the teacher weight
The loss function is
where
The generalization error is
3. Solution
Define
where
As
4. Derivation
4.1. Statistical Mechanics Formulation
The partition function is
Noting that
which corresponds to the noiseless-output setting.
Define the free energy
the free-energy density becomes
Using
Then one can rewrite the free-energy density as
The fields
Introducing order parameters into the free energy yields
where
In the large-
4.2. Replica Symmetric Ansatz
Assume replica symmetry:
Then Eq. (17) becomes
Using the Hubbard–Stratonovich transformation
one obtains the reparameterization version of Eq. (18):
and futher calculating the Gaussian integral yields
Using symmetry between
Eq. (19) becomes
where the following definitions were used
Therefore all three terms under the replica symmetric ansatz are
and the disorder-averaged free-energy density becomes
4.3. Saddle-point Equations
Letting
Substituting into Eq. (35) yields
Setting
The solution satisfies
Using the change of variables
one can rewrite the fixed-point equation as
Iterating this equation to convergence gives the value of
4.4. Generalization Error
Using the re-parameterization
and the identity
one can rewrite the generalization error as
Considering when
Using
Footnotes
-
This problem was first solved under the replica symmetric ansatz by G. Györgyi and N. Tishby (1990). It was also discussed in the perceptron work of H. S. Seung, H. Sompolinsky, and N. Tishby (1992), and in Chapter 8 of H. Nishimori’s book (2001), which is a highly valuable reference. ↩