NEW RESEARCH
Generative Kaleidoscopic Networks
Neural networks are deep learning models that can be trained to learn complex patterns and relationships within data. In a recent paper: Generative Kaleidoscopic Networks, researchers from Microsoft detail how they discovered an “over-generalization” phenomenon, which indicates that the neural networks tend to learn many-to-one mappings. They then use this phenomenon to introduce a new paradigm of generative modeling by creating a dataset kaleidoscope, dubbed ‘Generative Kaleidoscopic Networks.’ The researchers are exploring theoretical explanations, experiments on multimodal data, and conditional generation using the Generative Kaleidoscopic Networks.
Spotlight: Microsoft research newsletter
NEW RESEARCH
Text Diffusion with Reinforced Conditioning
Diffusion models are a type of machine learning model that have shown exceptional ability to generate high-quality images, videos, and audio. Due to their adaptiveness in iterative refinement, they offer potential for achieving better non-autoregressive sequence generation—which simultaneously predicts all elements of a sequence, rather than predicting the next element in a sequence.
However, existing text diffusion models have yet to fulfill this potential, due to challenges in handling the discreteness of language. In a recent paper: Text Diffusion with Reinforced Conditioning, researchers from Microsoft and external colleagues uncover two significant limitations in text diffusion models: degradation of self-conditioning during training and misalignment between training and sampling. In response, the researchers propose a novel model called TREC, which empowers text diffusion models with reinforced conditioning, mitigating the degradation by directly motivating quality improvements from self-conditions with reward signals. In the paper, which was presented at the 2024 Association for the Advancement of Artificial Intelligence conference (AAAI), they further propose time-aware variance scaling to address the misalignment issue.
Extensive experiments demonstrate the competitiveness of TREC against autoregressive, non-autoregressive, and diffusion baselines. Moreover, qualitative analysis shows its advanced ability to fully utilize the diffusion process in refining samples.
NEW RESEARCH
PRISE: Learning Temporal Action Abstractions as a Sequence Compression Problem
Temporal action abstractions, along with belief state representations, are powerful knowledge sharing mechanisms for sequential decision making. In a recent paper, PRISE: Learning Temporal Action Abstractions as a Sequence Compression Problem, researchers from Microsoft and University of Maryland propose a novel connection between the seemingly distant realms of training large language models (LLMs) and inducing temporal action abstractions for continuous control domains such as robotics. The researchers introduce an approach called Primitive Sequence Encoding (PRISE) that combines continuous action quantization with a subtle but critical component of LLM training pipelines — input tokenization via byte pair encoding (BPE) – to learn powerful variable-timespan action abstractions. They empirically show that high-level skills discovered by PRISE from a multitask set of robotic manipulation demonstrations significantly boost the performance of both multitask imitation learning and few-shot imitation learning on unseen tasks.