Towards Intelligent Videogame Generation

Menapace, Willi

doi:10.15168/11572_400170

Multimedia content plays a pivotal role in contemporary society, permeating diverse facets of communication and entertainment. However, the creation of high-quality multimedia content often requires specialized expertise and equipment, limiting its widespread accessibility. To address these challenges, deep learning-based methods have emerged as revolutionary tools for image and video generation, reshaping content creation landscapes and transforming professional workflows. These techniques have significantly impacted the media creation industry, facilitating more accessible and efficient media pipelines. Nevertheless, videogame development remains a resource-intensive endeavor. Videogames are complex soft real-time systems that necessitate meticulous modeling of 3D objects, animations, graphical effects, physics, and intelligent agents. The collaborative efforts of various specialized professionals, including designers, software engineers, 3D artists, and sound engineers, are required to develop videogames, each contributing their expertise to different aspects of the game. Furthermore, the demanding quantity of assets involved often requires specialized hardware such as 3D and motion capture equipment, professional cameras, and sound recording devices. Although early attempts to democratize game development introduced game engines to provide a strong foundation for the development process, creating videogames remains a multi-million-dollar endeavor even with sophisticated modern engines. This raises the question of whether deep learning can revolutionize videogame creation in a similar manner. This research proposes several novel methods aimed at streamlining the videogame creation process, leveraging the abundance of available videos showcasing gameplay. These methods automatically convert readily accessible video collections into playable game experiences. Firstly, we introduce a completely unsupervised pipeline called Playable Video Generation, which seamlessly transforms raw video collections into playable experiences. We demonstrate the effectiveness of this pipeline through compelling applications, including Atari games emulation, robotic arm control, and tennis simulations. Recognizing that videogames predominantly involve rendered compositions of 3D objects rather than 2D videos, we extend the framework to 3D environments in Playable Environments. Here, objects are represented as separate 3D entities, allowing for control over multiple entities and their attributes, such as style, as well as camera control. Lastly, we present the concept of Promptable Game Model, where intelligent agents within the game are empowered through the use of natural language instructions. This integration enables the creation of complex playable experiences in which agents are controlled through fine-grained actions and can act independently based on their own “game AI” to defeat opponents and navigate intricate environments. By harnessing the power of deep learning techniques, this research aims to democratize videogame development by enabling the transformation of annotated video collections into playable experiences. The proposed methods provide a promising avenue for revolutionizing the creation of videogames, similar to the transformative impact deep learning has had on the media creation industry. To facilitate future work in this novel area, all the data and code is made publicly available.

Towards Intelligent Videogame Generation / Menapace, Willi. - (2024 Jan 25), pp. 1-96. [10.15168/11572_400170]