🔬 Deep Dives
Python / Microsoft
2026-04-05 · 254 pts · 160 comments · ⭐ 1.2k today
💬 Major HN discussion (160 comments) · ⭐ 1.2k GitHub stars today
Open Source AI Platform - AI Chat with advanced features that works with every LLM
2026-04-02 · 101 pts · cs.LG · Zhengxi Lu, Zhiyuan Yao…
📄 New in cs.LG
Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill content imposes substantial token overhead, and the model never truly acquires the knowledge it merely follows. We ask whether skills can instead be internalized into model parameters, enabling zero-shot autonomous behavior without any runtime skill retrieval. SKILL0 introduces a training-time curriculum that begins with full skill context and progressively withdraws it.
Object / Video
2026-04-02 · 470 pts · cs.CV · cs.AI · Saman Motamed, William Harvey…
📄 New in cs.CV, cs.AI
During inference, a vision-language model identifies regions of the scene affected by the removed object. These regions are then used to guide a video diffusion model that generates physically consistent counterfactual outcomes. Experiments on both synthetic and real data show that our approach better preserves consistent scene dynamics after object removal compared to prior video object removal methods.
Text / Perception
2026-04-02 · 261 pts · cs.CV · Zheng-Hui Huang, Zhixiang Wang…
📄 New in cs.CV
Scaling generative inverse and forward rendering to real-world scenarios is bottlenecked by the limited realism and temporal coherence of existing synthetic datasets. Furthermore, to evaluate the real-world performance of inverse rendering without ground truth, we propose a novel VLM-based assessment protocol measuring semantic, spatial, and temporal consistency. Combined with our toolkit, our forward renderer enables users to edit styles of AAA games from G-buffers using text prompts.
2026-04-02 · 57 pts · 8 comments · cs.CV · cs.RO · Yongkang Li, Lijun Zhou…
📄 New in cs.CV, cs.RO
Vision-Language-Action (VLA) models have recently emerged in autonomous driving, with the promise of leveraging rich world knowledge to improve the cognitive capabilities of driving systems. However, adapting such models for driving tasks currently faces a critical dilemma between spatial perception and semantic reasoning. To overcome this, we propose UniDriveVLA, a Unified Driving Vision-Language-Action model based on Mixture-of-Transformers that addresses the perception-reasoning conflict via expert decoupling.
⚡ Quick Signals
Python / Microsoft
GitHub
HKUDS / LightRAG
1 pts · ⭐263 · ⭐ 263 GitHub stars today · ⭐ 32.2k total stars
GitHub
lyogavin / airllm
2 pts · ⭐64 · ⭐ 64 stars today on GitHub · ⭐ 14.9k total stars
Text / Perception
Aws / Engineer
Introduction / Computer
German / Implementation