ConIS: controllable text-driven image stylization with semantic intensity
Article
Yang, Gaoming, Li, Changgeng and Zhang, Ji. 2024. "ConIS: controllable text-driven image stylization with semantic intensity." Multimedia Systems. 30 (4). https://doi.org/10.1007/s00530-024-01381-1
Article Title | ConIS: controllable text-driven image stylization with semantic intensity |
---|---|
ERA Journal ID | 18082 |
Article Category | Article |
Authors | Yang, Gaoming, Li, Changgeng and Zhang, Ji |
Journal Title | Multimedia Systems |
Journal Citation | 30 (4) |
Article Number | 174 |
Number of Pages | 15 |
Year | 2024 |
Publisher | Springer |
Place of Publication | Germany |
ISSN | 0942-4962 |
1432-1882 | |
Digital Object Identifier (DOI) | https://doi.org/10.1007/s00530-024-01381-1 |
Web Address (URL) | https://link.springer.com/article/10.1007/s00530-024-01381-1 |
Abstract | Text-driven image stylization aims to synthesize content images with learned textual styles. Recent studies have shown the potential of the diffusion model for producing rich stylizations. However, existing approaches inefficiently control the degree of stylization, which hinders the balance between style and content in generated images. In this paper, we propose a Controllable Text-Driven Image Stylization (ConIS) Framework based on the diffusion model. The proposed framework introduces two modules into the pre-trained text-to-image model. The first is an unconditional null-text inversion (UNTI) module, which optimizes null-text embedding to reduce the bias between inversion and sampling in the diffusion model. Given a content image, this module is able to reconstruct it without semantic guidance. The second is a null-text dilution (NTD) module. We design a parameterization mechanism for the semantic intensity of textual conditions, which indirectly controls the degree of stylization through the style degree factor. Finally, we replace the attention maps used in the sampling process with those from the UNTI module to constrain the structure of content images. Experiments have shown that the proposed method enables fine-grained control over the degree of stylization without retraining or fine-tuning the network. Both qualitative and quantitative results indicate that the ConIS framework outperforms state-of-the-art methods in balancing artistic detail and content structure. |
Keywords | Diffusion model; Text-driven image stylization; Fine-grained control; Semantic intensity |
Contains Sensitive Content | Does not contain sensitive content |
ANZSRC Field of Research 2020 | 461299. Software engineering not elsewhere classified |
Byline Affiliations | Anhui University of Science and Technology, China |
School of Mathematics, Physics and Computing |
Permalink -
https://research.usq.edu.au/item/z85wz/conis-controllable-text-driven-image-stylization-with-semantic-intensity
107
total views0
total downloads3
views this month0
downloads this month