5. Analysis
5.1 Key Findings
RQ1 (Correctness): Constrained generation achieves comparable or better functional correctness. This demonstrates that constraint enforcement does not negatively impact the model's ability to generate functionally correct code.
RQ2 (Quality): Constrained generation shows a consistent improvement in quality metrics, with an average delta of +14.5 points (σ=3.4). The improvement is primarily driven by constraint adherence (100% vs 50%), indicating that explicit constraint enforcement effectively guides the model toward desired code patterns.
RQ3 (Efficiency): When the model is warm, constrained generation shows 81% speedup compared to baseline. This is consistent with findings from JSONSchemaBench that constrained decoding can actually improve throughput through reduced token sampling space.
5.2 Constraint Adherence Analysis
The most significant improvement from constrained generation is in constraint adherence (100% vs 50%). This includes:
- Export statements: Constrained generation always produces properly exported functions
- Type annotations: TypeScript type signatures are consistently correct
- Naming conventions: Function and variable names match requirements
5.3 Limitations
- Sample size: Evaluation covers 15 tasks; larger benchmarks would increase statistical power
- Single model: Results are specific to Qwen2.5-Coder-7B; larger models may show different patterns
- pass@1 only: We measure single-sample correctness; pass@k with k>1 would provide additional insight
- TypeScript only: Evaluation focuses on TypeScript; multi-language evaluation is future work