CAV 2019 (Overfitting) Slides

Overfitting in Synthesis

Saswat Padhi$ \,^1 $

Todd Millstein$ ^1 $ Aditya Nori$ \,^{2\,\texttt{GB}} $ Rahul Sharma$ ^{2\,\texttt{IN}} $

$ ^1 $ UCLA logo University of California, Los Angeles, USA

$ ^2 $ Microsoft logo Microsoft Research $ ^\texttt{GB}\, $Cambridge, UK $ ^\texttt{IN}\, $Bengaluru, India

Synthesis

How does the choice of the grammar
affect the performance of synthesis tools?

Theoretical results & experiments
that demonstrate overfitting in CEGIS

Practical mitigation strategies
inspired by ML techniques

State of The Art

Grammars : $ 6 $ commonly used arithmetic grammars
Benchmarks : $ 180 $ invariant-synthesis tasks over integers
Tools : $ 5 $ participants from SyGuS-Comp'18

(Timeout = $ 30 $ mins per benchmark per tool per grammar)

With more expressive power, every tool fails on many benchmarks it could previously solve!

But is the performance degradation
simply due to larger search spaces? …

Overfitting

ML notion: A function does not correctly generalize beyond the training data

On increasing expressiveness:	Increase	No Change
Increase in # Rounds $ \ \Rightarrow\ $ ________ in Synth. Time	$ 79 \,\% $	$ 6 \,\% $
Increase in Synth. Time $ \ \Rightarrow\ $ ________ in # Rounds	$ 27 \,\% $	$ 67 \,\% $

(For LoopInvGen on all $ 180 $ benchmarks and all $ 6 $ grammars with $ 30 $ mins timeout per benchmark per grammar)

Synthesizers not only spend more time

searching for a function within a large space,
but also collecting more examples from the verifier

Contributions

Theoretical Insights

Formal notions of learnability and overfitting for SyGuS

No free lunch — overfitting is inevitable at high expressiveness

Practical Solutions

PLearn, a black-box technique inspired by ensemble learning — explores multiple grammars in parallel

Hybrid enumeration (HE) — emulates PLearn by interleaving exploration of multiple grammars in a single thread

When combined with HE, the winning solver from the Inv track of SyGuS-Comp'18 is $ 5\times $ faster and solves $2$ more benchmarks

$ \boldsymbol{m} $-Learnability

(Learning from $ m $ observations / examples)

Machine Learning

Learned functions only need to be approximately correct
Typically require learning from any set of $ m $ i.i.d. samples

CEGIS-Based SyGuS

Learned functions must match the specification exactly
Too strong of a requirement for the CEGIS setting

A specification $ \phi $ is $ m $-learnable by a learner $ \mathcal{L} $
if there exist a set of $ m $ examples for $ \phi $ with which
$ \mathcal{L} $ can learn a correct function for $ \phi $.

(a significantly weaker notion of learnability)

No Free Lunch

Explicit tradeoff between grammar expressiveness and the minimum number of rounds required

Let $ X $ and $ Y $ be arbitrary domains, $ m \in \mathbb{Z} $ be s.t. $ 0 \leq m < |X| $, and $ \mathcal{E} $ be an arbitrary grammar. Then, either:

$ \mathcal{E} $ contains at most $\texttt{bound}(m)$ number of distinct $X \to Y$ functions,
or, for every learner $ \mathcal{L} $, there exists a specification $ \phi $ that admits a solution in $ \mathcal{E} $, but is not $ m $-learnable by $ \mathcal{L} $.

More details in our paper — finite and infinite $ X $ and $ Y $, the precise $ \texttt{bound} $, etc.

No Free Lunch: Examples

Two extreme cases:

► A singleton grammar $ \mathcal{E} $:

Any specification that admits a solution in $ \mathcal{E} $
is $ 0 $-learnable by any learner
Only one $ X \to Y $ function is expressible in $ \mathcal{E} $

► A fully expressive grammar $ \mathcal{E} $:

Every $ X \to Y $ function is expressible in $ \mathcal{E} $
For every learner, there exists a specification
that requires all $ \lvert X \rvert $ examples to be learnable

Overfitting

(Why some specifications require more examples to be learnable)

ML notion: When a learned function does not correctly generalize beyond the training data

SyGuS notion: When a learned function is consistent with the observed examples, but does not satisfy the given specification

Potential for Overfitting = Number of such functions in the grammar

The potential for overfitting increases
with grammar expressiveness

More details in our paper — precise bounds on number of examples and expressiveness

Contributions

Theoretical Insights

Formal notions of learnability and overfitting for SyGuS

No free lunch — overfitting is inevitable at high expressiveness

Practical Solutions

PLearn, a black-box technique inspired by ensemble learning — explores multiple grammars in parallel

Hybrid enumeration (HE) — emulates PLearn by interleaving exploration of multiple grammars in a single thread

When combined with HE, the winning solver from the Inv track of SyGuS-Comp'18 is $ 5\times $ faster and solves $2$ more benchmarks

PLearn

A technique inspired by ensemble methods [Dietterich, MCS’00] — run several learners and aggregate their results

Given a SyGuS problem $ \langle \phi, \mathcal{E} \rangle $ and grammars $ \mathcal{E}_1, \ldots, \mathcal{E}_n $ s.t. $ \mathcal{E}_i \subseteq \mathcal{E} $, create problems $ \langle \phi, \mathcal{E}_i \rangle $ and solve each in parallel.

A thin wrapper that generates subproblems
Agnostic to the underlying SyGuS learner

PLearn Reduces Overfitting: Every subproblem has a lower potential for overfitting than the original problem.
(on any set of examples for the specification)

$ \mathcal{E}_1 $ = Equalities	$ \mathcal{E}_2 $ = Inequalities	$ \mathcal{E}_3 $ = Octagons
$ \mathcal{E}_4 $ = Polyhedra	$ \mathcal{E}_5 $ = Polynomials	$ \mathcal{E}_6 $ = Peano

Grammar	$ {\small \textbf{median}}\Big[\frac{\tau(\textbf{P})}{\tau(\textbf{H})}\Big] $	$ {\small \textbf{median}}\Big[\frac{\tau(\textbf{H})}{\tau(\textbf{L})}\Big] $
Equalities	$ 1.00 $	$ 1.00 $
Inequalities	$ 1.91 $	$ 1.04 $
Octagons	$ 2.84 $	$ 1.03 $
Polyhedra	$ 3.72 $	$ 1.01 $
Polynomials	$ 4.62 $	$ 1.00 $
Peano	$ 5.49 $	$ 0.97 $

Overfitting in Synthesis

Synthesis

State of The Art

Overfitting

Contributions

$ \boldsymbol{m} $-Learnability

Machine Learning

CEGIS-Based SyGuS

No Free Lunch

No Free Lunch: Examples

Overfitting

Contributions

PLearn

PLearn: Evaluation

Limitations of PLearn

A 2-Dimensional Search

Hybrid Enumeration (HE)

HE: Performance

Conclusion

Links:

Overfitting in Synthesis

Synthesis

State of The Art

Overfitting

Contributions

$ \boldsymbol{m} $-Learnability

Machine Learning

CEGIS-Based SyGuS

No Free Lunch

No Free Lunch: Examples

Overfitting

Contributions

PLearn

PLearn: Evaluation

Limitations of PLearn

A 2-Dimensional Search

Hybrid Enumeration (HE)

HE: Performance

Related Work

Conclusion

Links: