Generative AI
Generative AI refers to artificial intelligence systems capable of producing new content — text, images, audio, video, or code — by learning the underlying distribution of training data.
Generative AI (GenAI) encompasses AI systems trained to produce novel content that resembles their training data. Unlike discriminative models — which classify or predict — generative models learn the statistical structure of a data distribution and can sample new instances from it. The field has grown explosively since 2022, driven by large language models, diffusion image models, and multimodal systems.
Core Architectures
Transformers (for text and multimodal)
The self-attention architecture powers all major LLMs (GPT-4, Claude, Gemini, Llama) and many multimodal models. Text is tokenised, fed through multiple attention and feed-forward layers, and a probability distribution over the vocabulary is produced at each step.
Diffusion Models (for images and audio)
Models like Stable Diffusion, DALL-E 3, and Midjourney learn to progressively denoise a signal — starting from pure Gaussian noise and iteratively refining it toward a coherent image conditioned on a text prompt. Diffusion has largely displaced GANs for image generation due to more stable training and higher quality.
Variational Autoencoders (VAE)
Encode input data into a compressed latent space from which new samples can be decoded. VAEs are used in image generation pipelines (often as the latent space backbone for diffusion models) and in drug discovery.
Generative Adversarial Networks (GAN)
Two competing networks — a generator and a discriminator — train in opposition. GANs produce highly realistic imagery but are notoriously difficult to train. Largely superseded by diffusion for image quality, but still used in video synthesis and style transfer.
Applications
| Domain | Application | Example | |--------|-------------|---------| | Writing | Article drafting, summarisation | Claude, ChatGPT | | Code | Code generation, debugging | GitHub Copilot, Cursor | | Images | Photo generation, illustration | Midjourney, DALL-E 3 | | Video | Short video clips, editing | Sora, Runway Gen-3 | | Audio | Music, speech synthesis | ElevenLabs, Suno | | 3D | Object and scene generation | Point-E, Shap-E | | Drug discovery | Molecular structure generation | AlphaFold 3, RFDiffusion |
Risks and Concerns
- Deepfakes — photorealistic synthetic media used for misinformation or non-consensual imagery
- Intellectual property — training on copyrighted data raises unresolved legal questions
- Hallucination — text models confidently assert false information
- Bias amplification — models can reproduce and amplify training data biases
- Environmental cost — training large models consumes significant energy
References
- Goodfellow, I. et al. (2014). "Generative Adversarial Networks." NeurIPS 2014.
- Ho, J. et al. (2020). "Denoising Diffusion Probabilistic Models." NeurIPS 2020.
- MDEC (2024). Malaysian Enterprise AI Adoption Survey 2024.