🔬 Deep Dives
2026-04-07 · 254 pts · 160 comments · ⭐ 639 today
💬 Major HN discussion (160 comments) · ⭐ 639 GitHub stars today
Open Source AI Platform - AI Chat with advanced features that works with every LLM
2026-04-06 · 45 pts · cs.CV · Yicheng Xiao, Wenhu Zhang…
📄 New in cs.CV
Image spatial editing performs geometry-driven transformations, allowing precise control over object layout and camera viewpoints. (ii) To address the data bottleneck for scalable training, we construct SpatialEdit-500k, a synthetic dataset generated with a controllable Blender pipeline that renders objects across diverse backgrounds and systematic camera trajectories, providing precise ground-truth transformations for both object- and camera-centric operations. (iii) Building on this data, we develop SpatialEdit-16B, a baseline model for fine-grained spatial editing.
2026-04-07 · 4 pts · 3 comments · ⭐ 1.6k today
⭐ 1.6k GitHub stars today · ⭐ 28.9k total stars
The agent that grows with you
2026-04-06 · 9 pts · cs.CL · cs.CV · Weian Mao, Xi Lin…
📄 New in cs.CL, cs.CV
Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. We show that this concentration causes queries to preferentially attend to keys at specific distances (e.g., nearest keys), with the centers determining which distances are preferred via a trigonometric series. Based on this, we propose TriAttention to estimate key importance by leveraging these centers.
2026-04-06 · 9 pts · cs.CV · Haoxuan Han, Weijie Wang…
📄 New in cs.CV
In this paper,we propose Degradation-Driven Prompting (DDP), a novel framework that improves VQA performance by strategically reducing image fidelity to force models to focus on essential structural information. Physical attributes targets images prone to human misjudgment, where DDP employs a combination of 80p downsampling, structural visual aids (white background masks and orthometric lines), and In-Context Learning (ICL) to calibrate the model's focus. Our experimental results demonstrate that less is more: by intentionally degrading visual inputs and providing targeted structural prompts, DDP enables VLMs to bypass distracting textures and achieve superior reasoning accuracy on challenging visual benchmarks.
⚡ Quick Signals
Python / Agent
Airllm / Lyogavin
GitHub
lyogavin / airllm
2 pts · ⭐102 · ⭐ 102 GitHub stars today · ⭐ 15.1k total stars
Unusable / Feb
Ghost / Pepper
Peptides / Begin
Gaussian / Point