Multi-task visual scene understanding aims to leverage the relationships among a set of correlated tasks, which are solved simultaneously by embedding them within a unified network. However, most existing methods give rise to two primary concerns from a task-level perspective: (1) the lack of task-independent correspondences for distinct tasks, and (2) the neglect of explicit task-consensual dependencies among various tasks. To address these issues, we propose a novel synergy embedding models (SEM), which goes beyond multi-task dense prediction by leveraging two innovative designs: the intra-task hierarchy-adaptive module and the inter-task EM-interactive module. Specifically, the constructed intra-task module incorporates hierarchy-adaptive keys from multiple stages, enabling the efficient learning of specialized visual patterns with an optimal trade-off. In addition, the developed inter-task module learns interactions from a compact set of mutual bases among various tasks, benefiting from the expectation maximization (EM) algorithm. Extensive empirical evidence from two public benchmarks, NYUD-v2 and PASCAL-Context, demonstrates that SEM consistently outperforms state-of-the-art approaches across a range of metrics.