Trending Misterio

Explorar





Descargar app  Subir

Inicio

Explorar

Escoge un idioma

Español

Ver contenido de

España

Escoge un idioma

Español

English

iVoox Podcast & radio

Descargar app gratis

Descargar app

Deep Dive in Research

Podcast

Deep Dive in Research 3r263l

Por NotebookLM

10

0

 



Discussion about interesting research papers 2r1i5w

Discussion about interesting research papers

Descargar en Play Store Escuchar en App

10

0



Comunidad

OpenEvolve: Open Source AlphaEvolve Implementation

Episodio en Deep Dive in Research

This article introduces OpenEvolve, an open-source implementation of Google DeepMind's AlphaEvolve, a system that leverages Large Language Models (LLMs) in an evolutionary framework to generate and optimize code. OpenEvolve allows s to evolve entire codebases by iteratively creating modifications using LLMs, evaluating them with automated metrics, and selecting promising solutions through an evolutionary process. The article details OpenEvolve's architecture, highlighting its key components like the Prompt Sampler and LLM Ensemble, and provides examples demonstrating its ability to achieve results comparable to AlphaEvolve in complex problems such as circle packing and function minimization, showcasing the evolution from simpler algorithms to more sophisticated solutions. It also discusses the importance of LLM performance and diversity for successful evolution and provides guidance on how to install and use the software for developing and improving algorithms.

Internet y tecnología 1 semana

 0

 0

 0



24:37

PTS: Pivotal Token Search

Episodio en Deep Dive in Research

This paper introduces Pivotal Token Search (PTS), a novel method for improving the performance of large language models by focusing on critical decision points in their output sequences. Unlike traditional methods that treat all generated tokens equally, PTS identifies "pivotal tokens" that significantly influence the probability of a successful generation. By using a binary search algorithm to pinpoint these key tokens, PTS generates preference pairs specifically centered on these critical decisions, leading to a more efficient learning signal during training. The release includes an open-source implementation, datasets of pivotal tokens and preference pairs, and fine-tuned models demonstrating the technique's effectiveness. This approach has potential applications in improving reasoning abilities, agent trajectories, and model interpretability.

Internet y tecnología 1 semana

 0

 0

 0



11:21

CameraBench: Understanding Video Motion

Episodio en Deep Dive in Research

This episode introduces CameraBench, a large-scale dataset and benchmark designed to improve camera motion understanding in videos. It details a taxonomy of camera motion primitives developed with cinematographers, highlighting how motions can relate to scene content like tracking subjects. The authors describe a rigorous annotation framework and human study demonstrating how domain expertise and training enhance annotation accuracy. Using CameraBench, they evaluate both Structure-from-Motion (SfM) and Video-Language Models (VLMs), finding that SfM struggles with semantic primitives while VLMs struggle with precise geometric motions. Finally, they show that fine-tuning a generative VLM on CameraBench significantly improves performance on tasks like motion-augmented captioning and video question answering.

Internet y tecnología 1 mes

 0

 0

 0



15:22

Step1X-Edit: General Image Editing Framework

Episodio en Deep Dive in Research

This epidsode introduces Step1X-Edit, an open-source image editing model designed to close the performance gap with proprietary models like GPT-4o. The developers created a large-scale, high-quality dataset and a new benchmark (GEdit-Bench) reflecting real-world editing instructions to train and evaluate the model. Step1X-Edit integrates a Multimedia Large Language Model (MLLM) with a diffusion-based image decoder to perform diverse edits based on natural language instructions. Experimental results indicate that Step1X-Edit outperforms existing open-source models and achieves performance comparable to leading closed-source systems.

Internet y tecnología 1 mes

 0

 0

 0



21:13

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Episodio en Deep Dive in Research

Visual reasoning is a core component of human intelligence and a critical capability for advanced multimodal models. Yet current reasoning evaluations of multimodal large language models (MLLMs) often rely on text descriptions and allow languagebased reasoning shortcuts, failing to measure genuine vision-centric reasoning. To address this, we introduce VisuLogic: a benchmark of 1,000 human-verified problems across six categories (e.g., quantitative shifts, spatial relations, attribute comparisons). These various types of questions can be evaluated to assess the visual reasoning capabilities of MLLMs from multiple perspectives. We evaluate leading MLLMs on this benchmark and analyze their results to identify common failure modes. Most models score below 30% accuracy—only slightly above the 25% random baseline and far below the 51.4% achieved by humans—revealing significant gaps in visual reasoning. Furthermore, we provide a supplementary training dataset and a reinforcement-learning baseline to further progress. Code, data, and baselines are available at https://visulogic-benchmark.github.io/VisuLogic.

Internet y tecnología 1 mes

 0

 0

 0



18:57

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Episodio en Deep Dive in Research

Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning capabilities of LLMs, particularly in mathematics and programming tasks. It is widely believed that RLVR enables LLMs to continuously self-improve, thus acquiring novel reasoning abilities that exceed corresponding base models' capacity. In this study, however, we critically re-examines this assumption by measuring the @k metric with large values of k to explore the reasoning capability boundary of the models across a wide range of model families and benchmarks. Surprisingly, the RL does not, in fact, elicit fundamentally new reasoning patterns. While RL-trained models outperform their base models at smaller values of k (eg, k=1), base models can achieve a comparable or even higher @k score compared to their RL counterparts at large k values. The reasoning paths generated by RL-trained models are already included in the base models' sampling distribution, suggesting that most reasoning abilities manifested in RL-trained models are already obtained by base models. Further analysis shows that RL training boosts the performance by biasing the model's output distribution toward paths that are more likely to yield rewards, therefore sampling correct responses more efficiently. But this also results in a narrower reasoning capability boundary compared to base models. Similar results are observed in visual reasoning tasks trained with RLVR. Moreover, we find that distillation can genuinely introduce new knowledge into the model, different from RLVR. These findings underscore a critical limitation of RLVR in advancing LLM reasoning abilities which requires us to fundamentally rethink the impact of RL training in reasoning LLMs and the need of a better paradigm. Project Page: https://limit-of-RLVR.github.io

Internet y tecnología 1 mes

 0

 0

 0



12:33

Learning to Reason under Off-Policy Guidance

Episodio en Deep Dive in Research

Recent advances in large reasoning models (LRMs) demonstrate that sophisticated behaviors such as multi-step reasoning and self-reflection can emerge via reinforcement learning (RL) with simple rule-based rewards. However, existing zero-RL approaches are inherently ``on-policy'', limiting learning to a model's own outputs and failing to acquire reasoning abilities beyond its initial capabilities. We introduce LUFFY (Learning to reason Under oFF-policY guidance), a framework that augments zero-RL with off-policy reasoning traces. LUFFY dynamically balances imitation and exploration by combining off-policy demonstrations with on-policy rollouts during training. Notably, we propose policy shaping via regularized importance sampling to avoid superficial and rigid imitation during mixed-policy training. Remarkably, LUFFY achieves an over +7.0 average gain across six math benchmarks and an advantage of over +6.2 points in out-of-distribution tasks. It also substantially sures imitation-based supervised fine-tuning (SFT), particularly in generalization. Analysis shows LUFFY not only imitates effectively but also explores beyond demonstrations, offering a scalable path to train generalizable reasoning models with off-policy guidance.

Internet y tecnología 1 mes

 0

 0

 0



12:46

AI's Potential to Transform the World

Episodio en Deep Dive in Research

This episode explores a hopeful vision of the future with powerful AI, focusing on how AI could revolutionize five key areas: biology and health, neuroscience and mind, economic development and poverty, peace and governance, and work and meaning. us as we examine the potential of AI to solve humanity’s biggest challenges and unlock a future of abundance and well-being for everyone.

Internet y tecnología 7 meses

 0

 0

 0



23:27

Contents On the Nature of Time

Episodio en Deep Dive in Research

This text explores the nature of time from a computational perspective. It argues that time is not a fundamental coordinate but rather a consequence of the universe's computational processes. The author proposes that time is "the progressive doing of computation by the universe," and that our perception of time arises from our own computational limitations as observers. The text further suggests that the universe's computational irreducibility, the idea that there is no shortcut to understanding a system's evolution, contributes to the robustness of time as a unidirectional flow. The author also examines the concepts of multiple threads of time, the ruliad (the totality of all possible computational processes), and the role of computational boundedness in shaping our perception of time and physical laws.

Internet y tecnología 7 meses

 0

 0

 0



11:21

MovieGen: A Detailed Review of Meta's Text-to-Video Generation System

Episodio en Deep Dive in Research

This research paper describes the development and capabilities of "Movie Gen," a new suite of generative AI models that produce high-quality, realistic videos and audio. The paper highlights key advancements in text-to-video and video-to-audio synthesis, video editing, and video personalization. The authors detail their models' architecture, training procedures, and evaluation metrics, demonstrating superior performance compared to existing commercial and open-source solutions. This research aims to advance the field of media generation and enable new creative possibilities.

Internet y tecnología 7 meses

 0

 0

 0



12:51

También te puede gustar Ver más

monos estocásticos monos estocásticos es un podcast sobre inteligencia artificial presentado por Antonio Ortiz (@antonello) y Matías S. Zavia (@matiass). Sacamos un episodio nuevo cada jueves. Puedes seguirnos en YouTube, LinkedIn y X. Más enlaces en cuonda.com/monos-estocasticos/links Hacemos todo lo que los monos estocásticos saben hacer: coser secuencias de formas lingüísticas que hemos observado en nuestros vastos datos de entrenamiento según la información probabilística de cómo se combinan. Actualizado

Podcast Vidas en red La movilidad como forma de vida, y como ideología tecnofila la filosofía MEEK. Actualizado

Hablando Crypto ¿Te interesan las criptomonedas? A nosotros también. Somos Óscar y Cristian. Después de más de 5 años jugueteando con las criptomonedas os explicamos nuestras historias. También hablamos sobre como vemos el crypto-mundo y hacia donde creemos que irá. Actualizado

Ir a Internet y tecnología