Prompt-Layered Architecture: A New Stack for AI-First Product Design

Savi Khatri

doi:10.18535/ijsrm/v12i09.ec09

Abstract

With the advancement of large language models powering next-generation applications, there is an increasing demand for architectural frameworks that treat prompts as modular, orchestratable, and extendible parts of a software system. The traditional methods of AI integration have treated prompt engineering as some kind of ad hoc or application-specific task with no connection to systematic design principles or software architecture standards. The paper introduces the Prompt-Layered Architecture (PLA), a new architectural style where prompts have been elevated into first-class citizens of the software stack. PLA provides composition, management, and orchestration of prompts through modularized layers, thus allowing the building of AI-first products that are scalable, testable, and extendible.

We formalize the PLA model as four core layers: the Prompt Composition Layer, the Prompt Orchestration Layer, the Response Interpretation Layer, and the Domain Memory Layer, which together support reuse of prompt templates, structured routing of model outputs, persistence of memories across interaction chains, and resonance to business logic and user context. Inspired by traditional layered software architectures, PLA brings versioning to LLMs, API-driven abstraction of prompts, and test scaffolding for verifying LLM behavior.

To validate the design, we develop a prototype implementation on top of the OpenAI GPT APIs and evaluate the PLA versus flat prompt-based systems on modularity metrics, reusability benchmarks, and cognitive load for prompt engineers. The results evidence the benefits of PLA in improving maintainability while fast-tracking the integration of AI capabilities across various distributed services. The paper also illustrates several SmartArt diagrams and examples of orchestration in Python and discusses how PLA fills the gap between emerging frameworks such as LangChain, AutoGPT, and prompt programming compilers.

By formalizing prompts as copiable units of architecture, this research lays the blueprint for building scalable AI-first applications with structured reasoning, state awareness, and prompt governance.

Keywords

AI-first product designprompt engineeringlayered architecturemodular promptsorchestrationgenerative AI systemsLLM pipelinesprompt stackextensioncompositional AI

References

T. Brown et al., "Language models are few-shot learners," in Proc. NeurIPS, 2020, pp. 1877–1901.Google Scholar ↗
A. Bach et al., "PromptSource: An integrated development environment and repository for natural language prompts," in Proc. EMNLP, 2022. [Online]. Available: https://github.com/bigscience-workshop/promptsourceGoogle Scholar ↗
PromptLayer, “PromptLayer – Prompt logging and versioning,” 2023. [Online]. Available: https://www.promptlayer.comGoogle Scholar ↗
LangChain, “LangChain documentation,” 2023. [Online]. Available: https://docs.langchain.comGoogle Scholar ↗
Significant Gravitas, "AutoGPT: Autonomous GPT-4 experiment," GitHub, 2023. [Online]. Available: https://github.com/Torantulino/Auto-GPTGoogle Scholar ↗
Yohei Nakajima, "BabyAGI: AI-powered task management," GitHub, 2023. [Online]. Available: https://github.com/yoheinakajima/babyagiGoogle Scholar ↗
H. Khattab et al., "DSPy: An interpretable programming model for building LLM pipelines," arXiv preprint arXiv:2305.14247, 2023.Google Scholar ↗
M. Tunstall et al., "Guidance: A declarative language for controlling large language models," GitHub, 2023. [Online]. Available: https://github.com/microsoft/guidanceGoogle Scholar ↗
Superagent Team, “Superagent: Build LLM-powered agents in minutes,” GitHub, 2023. [Online]. Available: https://github.com/homanp/superagentGoogle Scholar ↗
E. Gamma, R. Helm, R. Johnson, and J. Vlissides, *Design Patterns: Elements of Reusable Object-Oriented Software*, Addison-Wesley, 1994.Google Scholar ↗
D. Sculley et al., "Hidden technical debt in machine learning systems," in Proc. NeurIPS, 2015, pp. 2503–2511.Google Scholar ↗
T. Wolf et al., "Transformers: State-of-the-art natural language processing," in Proc. EMNLP: System Demonstrations, 2020, pp. 38–45.Google Scholar ↗
OpenAI, “OpenAI API documentation,” 2024. [Online]. Available: https://platform.openai.com/docsGoogle Scholar ↗
Pinecone, “Pinecone vector database,” 2024. [Online]. Available: https://www.pinecone.ioGoogle Scholar ↗
J. Devlin et al., "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. NAACL, 2019, pp. 4171–4186.Google Scholar ↗
R. Parrish and J. Steinhardt, "Prompt engineering best practices," OpenAI Technical Report, 2022.Google Scholar ↗
M. Mitchell et al., "Model cards for model reporting," in Proc. FAT*, 2019, pp. 220–229.Google Scholar ↗
A. Tamkin et al., "Understanding the capabilities, limitations, and societal impact of large language models," arXiv preprint arXiv:2102.02503, 2021.Google Scholar ↗
D. Hudson et al., "Composable systems for language model orchestration," in Proc. ACM FAccT, 2022.Google Scholar ↗
D. Liang et al., "Chain-of-thought prompting: Reasoning via intermediate steps," arXiv preprint arXiv:2201.11903, 2022.Google Scholar ↗
M. Nye et al., "Show your work: Scratchpads for intermediate computation with language models," in Proc. NeurIPS, 2021.Google Scholar ↗
S. Singh et al., "FLAML: A fast and lightweight AutoML library," in Proc. ICML, 2021.Google Scholar ↗
A. Radford et al., "GPT-4 Technical Report," OpenAI, Tech. Rep., 2023. [Online]. Available: https://openai.com/research/gpt-4Google Scholar ↗
C. Olston, S. F. R. Kaplan, and A. Elmeleegy, "Dataflow programming and its relevance to AI systems," in Proc. CIDR, 2021.Google Scholar ↗
M. Bansal and D. Lee, "Task decomposition in NLP agents," in Proc. ACL, 2022, pp. 456–468.Google Scholar ↗
J. Kreps, "Microservices and DevOps: Re-thinking software architecture," InfoQ, 2021. [Online]. Available: https://www.infoq.com/articles/microservices-devops-architectureGoogle Scholar ↗
F. Chollet, "On the measure of intelligence," *arXiv preprint arXiv:1911.01547*, 2019.Google Scholar ↗
L. Weidinger et al., "Ethical and social risks of LLMs," arXiv preprint arXiv:2112.04359, 2021.Google Scholar ↗
M. Reynolds et al., "LLMOps: Building production LLM systems," arXiv preprint arXiv:2307.09288, 2023.Google Scholar ↗
A. Zimek, E. Schubert, and H. Kriegel, "A survey on unsupervised outlier detection," *Stat. Anal. Data Mining*, vol. 5, no. 5, pp. 363–387, 2012.Google Scholar ↗

[refR-1] T. Brown et al., "Language models are few-shot learners," in Proc. NeurIPS, 2020, pp. 1877–1901.Google Scholar ↗

[refR-2] A. Bach et al., "PromptSource: An integrated development environment and repository for natural language prompts," in Proc. EMNLP, 2022. [Online]. Available: https://github.com/bigscience-workshop/promptsourceGoogle Scholar ↗

[refR-3] PromptLayer, “PromptLayer – Prompt logging and versioning,” 2023. [Online]. Available: https://www.promptlayer.comGoogle Scholar ↗

[refR-4] LangChain, “LangChain documentation,” 2023. [Online]. Available: https://docs.langchain.comGoogle Scholar ↗

[refR-5] Significant Gravitas, "AutoGPT: Autonomous GPT-4 experiment," GitHub, 2023. [Online]. Available: https://github.com/Torantulino/Auto-GPTGoogle Scholar ↗

[refR-6] Yohei Nakajima, "BabyAGI: AI-powered task management," GitHub, 2023. [Online]. Available: https://github.com/yoheinakajima/babyagiGoogle Scholar ↗

[refR-7] H. Khattab et al., "DSPy: An interpretable programming model for building LLM pipelines," arXiv preprint arXiv:2305.14247, 2023.Google Scholar ↗

[refR-8] M. Tunstall et al., "Guidance: A declarative language for controlling large language models," GitHub, 2023. [Online]. Available: https://github.com/microsoft/guidanceGoogle Scholar ↗

[refR-9] Superagent Team, “Superagent: Build LLM-powered agents in minutes,” GitHub, 2023. [Online]. Available: https://github.com/homanp/superagentGoogle Scholar ↗

[refR-10] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, *Design Patterns: Elements of Reusable Object-Oriented Software*, Addison-Wesley, 1994.Google Scholar ↗

[refR-11] D. Sculley et al., "Hidden technical debt in machine learning systems," in Proc. NeurIPS, 2015, pp. 2503–2511.Google Scholar ↗

[refR-12] T. Wolf et al., "Transformers: State-of-the-art natural language processing," in Proc. EMNLP: System Demonstrations, 2020, pp. 38–45.Google Scholar ↗

[refR-13] OpenAI, “OpenAI API documentation,” 2024. [Online]. Available: https://platform.openai.com/docsGoogle Scholar ↗

[refR-14] Pinecone, “Pinecone vector database,” 2024. [Online]. Available: https://www.pinecone.ioGoogle Scholar ↗

[refR-15] J. Devlin et al., "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. NAACL, 2019, pp. 4171–4186.Google Scholar ↗

[refR-16] R. Parrish and J. Steinhardt, "Prompt engineering best practices," OpenAI Technical Report, 2022.Google Scholar ↗

[refR-17] M. Mitchell et al., "Model cards for model reporting," in Proc. FAT*, 2019, pp. 220–229.Google Scholar ↗

[refR-18] A. Tamkin et al., "Understanding the capabilities, limitations, and societal impact of large language models," arXiv preprint arXiv:2102.02503, 2021.Google Scholar ↗

[refR-19] D. Hudson et al., "Composable systems for language model orchestration," in Proc. ACM FAccT, 2022.Google Scholar ↗

[refR-20] D. Liang et al., "Chain-of-thought prompting: Reasoning via intermediate steps," arXiv preprint arXiv:2201.11903, 2022.Google Scholar ↗

[refR-21] M. Nye et al., "Show your work: Scratchpads for intermediate computation with language models," in Proc. NeurIPS, 2021.Google Scholar ↗

[refR-22] S. Singh et al., "FLAML: A fast and lightweight AutoML library," in Proc. ICML, 2021.Google Scholar ↗

[refR-23] A. Radford et al., "GPT-4 Technical Report," OpenAI, Tech. Rep., 2023. [Online]. Available: https://openai.com/research/gpt-4Google Scholar ↗

[refR-24] C. Olston, S. F. R. Kaplan, and A. Elmeleegy, "Dataflow programming and its relevance to AI systems," in Proc. CIDR, 2021.Google Scholar ↗

[refR-25] M. Bansal and D. Lee, "Task decomposition in NLP agents," in Proc. ACL, 2022, pp. 456–468.Google Scholar ↗

[refR-26] J. Kreps, "Microservices and DevOps: Re-thinking software architecture," InfoQ, 2021. [Online]. Available: https://www.infoq.com/articles/microservices-devops-architectureGoogle Scholar ↗

[refR-27] F. Chollet, "On the measure of intelligence," *arXiv preprint arXiv:1911.01547*, 2019.Google Scholar ↗

[refR-28] L. Weidinger et al., "Ethical and social risks of LLMs," arXiv preprint arXiv:2112.04359, 2021.Google Scholar ↗

[refR-29] M. Reynolds et al., "LLMOps: Building production LLM systems," arXiv preprint arXiv:2307.09288, 2023.Google Scholar ↗

[refR-30] A. Zimek, E. Schubert, and H. Kriegel, "A survey on unsupervised outlier detection," *Stat. Anal. Data Mining*, vol. 5, no. 5, pp. 363–387, 2012.Google Scholar ↗