AI art in architecture

Joern Ploennigs; Markus Berger

doi:10.1007/s43503-023-00018-y

AI in Civil Engineering ›› 2023, Vol. 2 ›› Issue (1) : 8. DOI: 10.1007/s43503-023-00018-y

Original Article

AI art in architecture

Joern Ploennigs¹^,^a ,
Markus Berger¹

Author information +

History +

Abstract

Recent diffusion-based AI art platforms can create impressive images from simple text descriptions. This makes them powerful tools for concept design in any discipline that requires creativity in visual design tasks. This is also true for early stages of architectural design with multiple stages of ideation, sketching and modelling. In this paper, we investigate how applicable diffusion-based models already are to these tasks. We research the applicability of the platforms Midjourney, DALL

·

E 2 and Stable Diffusion to a series of common use cases in architectural design to determine which are already solvable or might soon be. Our novel contributions are: (i) a comparison of the capabilities of public AI art platforms; (ii) a specification of the requirements for AI art platforms in supporting common use cases in civil engineering and architecture; (iii) an analysis of 85 million Midjourney queries with Natural Language Processing (NLP) methods to extract common usage patterns. From this we derived (iv) a workflow for creating images for interior designs and (v) a workflow for creating views for exterior design that combines the strengths of the individual platforms.

Keywords

Image generation / Diffusion models / Natural language processing / Architecture

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Joern Ploennigs, Markus Berger. AI art in architecture. AI in Civil Engineering, 2023, 2(1): 8 https://doi.org/10.1007/s43503-023-00018-y

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Borji, A. (2022). Generated faces in the wild: Quantitative comparison of stable diffusion, midjourney and DALL-E 2. arXiv preprint http://arxiv.org/abs/2210.00586.

[2]	BrownT, MannB, RyderN, SubbiahM, KaplanJD, DhariwalP, NeelakantanA, ShyamP, SastryG, AskellA, AgarwalS. Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020, 33: 1877-901

[3]	HoJ, JainA, AbbeelP. Denoising diffusion probabilistic models. Nips, 2020, 33: 6840-6851

[4]	Ho, J., Salimans, T., Gritsenko, A.A., Chan, W., Norouzi, M., & Fleet, D.J. (2022). Video diffusion models. ICLR workshop on deep generative models for highly structured data.

[5]	Kawar, B., Zada, S., Lang, O., Tov O, Chang, H., Dekel, T., Mosseri, I., & Irani, M. (2022). Imagic: Text-based real image editing with diffusion models. arXiv preprint http://arxiv.org/abs/2210.09276.

[6]	Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., & Van Gool, L. (2022). Repaint: Inpainting using denoising diffusion probabilistic models. CVPR (pp. 11461–11471).

[7]	Luo, S., & Hu, W. (2021). Diffusion probabilistic models for 3D point cloud generation. CVPR (pp. 2837–2845).

[8]	Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Nips (Vol. 26).

[9]	Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., & Chen, M. (2022). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. ICML.

[10]	Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G. (2021). Learning transferable visual models from natural language supervision. ICML (pp. 8748–8763).

[11]	Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with CLIP latents. arXiv preprint http://arxiv.org/abs/2204.06125.

[12]	Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022, June). High-resolution image synthesis with latent diffusion models. CVPR (p. 10684–10695).

[13]	Saharia, C., Chan, W., Chang, H., Lee, C., Ho, J., Salimans, T., Fleet, D., & Norouzi, M. (2022). Palette: Image-to-image diffusion models. ACM SIGGRAPH (pp. 1–10).

[14]	Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., & Norouzi, M. (2022). Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]	Seneviratne, S., Senanayake, D., Rasnayaka, S., Vidanaarachchi, R., & Thompson, J. (2022). DALLE-URBAN: Capturing the urban design expertise of large text to image transformers. International Conference on Digital Image Computing: Techniques and Applications.

[16]	Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. ICML (pp. 2256–2265).

[17]	Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., & Bermano, A.H. (2022). Human motion diffusion model. arXiv preprint http://arxiv.org/abs/2209.14916.

[18]	Zeng, X., Vahdat, A., Williams, F., Gojcic, Z., Litany, O., Fidler, S., & Kreis, K. (2022). LION: Latent point diffusion models for 3D shape generation.