Text-guided semantic manipulation refers to semantically editing an image
generated from a source prompt to match a target prompt, enabling the desired
semantic changes (e.g., addition, removal, and style transfer) while preserving
irrelevant contents. With the powerful generative capabilities of the diffusion
model, the task has shown the potential to generate high-fidelity visual
content. Nevertheless, existing methods either typically require time-consuming
fine-tuning (inefficient), fail to accomplish multiple semantic manipulations
(poorly extensible), and/or lack support for different modality tasks (limited
generalizability). Upon further investigation, we find that the geometric
properties of noises in the diffusion model are strongly correlated with the
semantic changes. Motivated by this, we propose a novel $\textit{GTF}$ for
text-guided semantic manipulation, which has the following attractive
capabilities: 1) $\textbf{Generalized}$: our $\textit{GTF}$ supports multiple
semantic manipulations (e.g., addition, removal, and style transfer) and can be
seamlessly integrated into all diffusion-based methods (i.e., Plug-and-play)
across different modalities (i.e., modality-agnostic); and 2)
$\textbf{Training-free}$: $\textit{GTF}$ produces high-fidelity results via
simply controlling the geometric relationship between noises without tuning or
optimization. Our extensive experiments demonstrate the efficacy of our
approach, highlighting its potential to advance the state-of-the-art in
semantics manipulation.
Este artículo explora los viajes en el tiempo y sus implicaciones.
Descargar PDF:



