Saswat Padhi$ \,^1 $
Todd Millstein$ ^1 $ Aditya Nori$ \,^{2\,\texttt{GB}} $ Rahul Sharma$ ^{2\,\texttt{IN}} $
$ ^1 $ University of California, Los Angeles, USA
$ ^2 $ Microsoft Research $ ^\texttt{GB}\, $Cambridge, UK $ ^\texttt{IN}\, $Bengaluru, India
How does the choice of the grammar
affect the performance of synthesis tools?
Theoretical results & experiments
that demonstrate overfitting in CEGIS
Practical mitigation strategies
inspired by ML techniques
(Timeout = $ 30 $ mins per benchmark per tool per grammar)
With more expressive power, every tool fails on many benchmarks it could previously solve!
But is the performance degradation
simply due to larger search spaces? …
ML notion: A function does not correctly generalize beyond the training data
On increasing expressiveness: | Increase | No Change |
---|---|---|
Increase in # Rounds $ \ \Rightarrow\ $ ________ in Synth. Time | $ 79 \,\% $ | $ 6 \,\% $ |
Increase in Synth. Time $ \ \Rightarrow\ $ ________ in # Rounds | $ 27 \,\% $ | $ 67 \,\% $ |
(For LoopInvGen on all $ 180 $ benchmarks and all $ 6 $ grammars with $ 30 $ mins timeout per benchmark per grammar)
Synthesizers not only spend more time
Theoretical Insights
Practical Solutions
(Learning from $ m $ observations / examples)
A specification $ \phi $ is $ m $-learnable by a learner $ \mathcal{L} $
if there exist a set of $ m $ examples for $ \phi $ with which
$ \mathcal{L} $ can learn a correct function for $ \phi $.
(a significantly weaker notion of learnability)
Explicit tradeoff between grammar expressiveness and the minimum number of rounds required
Let $ X $ and $ Y $ be arbitrary domains, $ m \in \mathbb{Z} $ be s.t. $ 0 \leq m < |X| $, and $ \mathcal{E} $ be an arbitrary grammar. Then, either:
More details in our paper — finite and infinite $ X $ and $ Y $, the precise $ \texttt{bound} $, etc.
Two extreme cases:
► A singleton grammar $ \mathcal{E} $:
► A fully expressive grammar $ \mathcal{E} $:
(Why some specifications require more examples to be learnable)
ML notion: When a learned function does not correctly generalize beyond the training data
SyGuS notion: When a learned function is consistent with the observed examples, but does not satisfy the given specification
Potential for Overfitting = Number of such functions in the grammar
The potential for overfitting increases
with grammar expressiveness
More details in our paper — precise bounds on number of examples and expressiveness
Theoretical Insights
Practical Solutions
A technique inspired by ensemble methods [Dietterich, MCS’00] — run several learners and aggregate their results
Given a SyGuS problem $ \langle \phi, \mathcal{E} \rangle $ and grammars $ \mathcal{E}_1, \ldots, \mathcal{E}_n $ s.t. $ \mathcal{E}_i \subseteq \mathcal{E} $, create problems $ \langle \phi, \mathcal{E}_i \rangle $ and solve each in parallel.
PLearn Reduces Overfitting:
Every subproblem has a lower potential for overfitting than the original problem.
(on any set of examples for the specification)
(✖) Original failure count
(U-shaped trend)
(●) Failure count with PLearn
(decreasing trend)
(Timeout = $ 30 $ mins per benchmark per tool per grammar)
$ \mathcal{E}_1 $ = Equalities | $ \mathcal{E}_2 $ = Inequalities | $ \mathcal{E}_3 $ = Octagons |
$ \mathcal{E}_4 $ = Polyhedra | $ \mathcal{E}_5 $ = Polynomials | $ \mathcal{E}_6 $ = Peano |
Extremely resource intensive —
runs multiple SyGuS instances in parallel
Many redundant computations —
functions common to different grammars are enumerated multiple times
Can we:
in a single-threaded algorithm?
Expressions of size $ 4 $ that appear only in $ \mathcal{E}_6 $, but not in $ \mathcal{E}_1, \ldots, \mathcal{E}_5$
We call a relation $ \lhd $ on $ \mathcal{E}_{1,\ldots,n} \times \mathbb{N} $ a well order
if
$ \forall \mathcal{E}_1, \mathcal{E}_2, k_1, k_2 : [\mathcal{E}_1 \subseteq \mathcal{E}_2 \ \wedge \ k_1 < k_2] \Rightarrow (\mathcal{E}_1,k_1) \lhd (\mathcal{E}_2,k_2) $.
An efficient implementation of this 2-D search for component-based grammars [Jha et al, ICSE'10]
Arguments to HE:
Completeness: HE can enumerate every expression in $ \mathcal{E}_n $ up to any size bound $ q $.
Efficiency:
HE enumerates each syntactically
distinct expression in $ \mathcal{E}_n $ at most once.
Grammar | $ {\small \textbf{median}}\Big[\frac{\tau(\textbf{P})}{\tau(\textbf{H})}\Big] $ | $ {\small \textbf{median}}\Big[\frac{\tau(\textbf{H})}{\tau(\textbf{L})}\Big] $ |
---|---|---|
Equalities | $ 1.00 $ | $ 1.00 $ |
Inequalities | $ 1.91 $ | $ 1.04 $ |
Octagons | $ 2.84 $ | $ 1.03 $ |
Polyhedra | $ 3.72 $ | $ 1.01 $ |
Polynomials | $ 4.62 $ | $ 1.00 $ |
Peano | $ 5.49 $ | $ 0.97 $ |
(On all $ 180 $ benchmarks with $ 30 $ mins timeout per benchmark per tool per grammar)
► Bias-variance tradeoffs in program analysis [Sharma et al, POPL'14]
► A theory of formal synthesis [Jha and Seshia, Acta Informatica'17]