Abstract
Traditional approaches to disaster recovery make use of “implicit trust” that enables hackers to pivot from an infected machine to its secured backups. Although Zero-Trust Architecture addresses this issue through isolation of the repository, it gives rise to another problem called the “Bare-Metal Paradox.” This is because when a freshly configured machine does not have any identity or encryption keys that, use to establish rightful access to isolated backups. This paper proposes a new model for resolving this paradox using an “Ephemeral Cryptographic Escrow” method. The main objective was to build a safe, temporary “bridge,” which would enable the fresh installation of Linux to establish itself and start the process of automated recovery without breaking into the security perimeters of vault. Utilizing a Design Science approach, the study developed the zt-backup-kit, a technical artifact designed to orchestrate direct-from-cloud restoration using a “Secure Pull” architecture. The framework was evaluated through sixty recovery iterations, comparing the zt-backup-kit against standard manual recovery procedures. To ensure real-world validity, the system was deployed and monitored on a production server at the Faculty of Social Sciences and Languages (FSSL-SUSL). The results show that the automatic zt-backup-kit was able to restore the systems within 4 minutes on average, while the manual process took around 22 minutes, translating to an 81% decrease in downtime. The production data showed that there were 100% successful restorations with 38% less space consumption because of the data deduplication feature of the system. The research proves that the Zero Trust Recovery process is faster and secure than traditional recovery mechanisms. The zt-backup-kit offers an effective model for institutional resilience with a specific focus on Day Zero authentication within the new standards of security such as NIST SP 800-207.
Keywords
Zero-Trust Architecture Bare-Metal Recovery Cryptographic Escrow Disaster Recovery Linux Security Cyber Resilience Bare-Metal Paradox Linux Bash Script
Introduction
Modern organizational infrastructure security cannot be guaranteed merely through strong perimeter defenses; rather, it must also be capable of recovering quickly and efficiently from any failure scenarios that might arise. In terms of DR, there is a fine balance between the need for isolation of backup storage from sophisticated and persistent attacks and the requirement of keeping a low Recovery Time Objective (RTO). New developments in disaster recovery have emerged recently in the work of where ransomware risk has been effectively avoided through the use of 'Secure Pull' designs. In a Secure Pull approach, the trusted backup server is responsible for pulling the data from the untrusted production environment and thereby establishing an air gap using software which renders the data repository out of reach from any compromised client. The design of such approaches is supported by theoretical approaches including the use of Design Science Research Methodology (DSRM) and ensures that data survive root compromise while simultaneously guaranteeing minimal RTOs for localized logical failures . Such a design is inherently Zero-Trust Architecture based where each system or device needs to be authenticated before accessing any resource on the network.
However, even though Secure Pull architecture prevents any kind of logical attack, it unwittingly makes it difficult for organizations to perform Bare-Metal Recovery (BMR). This process involves recreating an entire system on new hardware after a severe disaster hits. If the production server becomes unusable and a new one needs to be created, the new server will lack the network paths and shared credentials that would give it access to the isolated local vault. Herein lies the problem, administrators can either jeopardize the Zero-Trust principle by granting live credentials during such recovery, thus leaving an organization vulnerable; or they can manually configure the credentials to access the vault, but it takes considerable time and delays the RTO too long. Thus, there is still room left for improvement in terms of automated BMR.
The study introduces and defines the Bare-Metal Paradox, a logical circular dependency in Zero-Trust environments in disaster recovery scenarios. This paradox arises when the initial state of hardware deployment, known as the "bare metal," does not have the cryptographic credentials needed to validate itself against software-defined and isolated backup vaults. Since these identity credentials and configuration parameters are stored in these isolated vaults, this paradox creates a scenario where self-validation is impossible, thus making it impossible to restore a failed infrastructure using the same principles that ensure its security.
The aim of this paper is to address this conundrum by proposing a decentralised recovery process flow that leverages the concept of an "Offline Cryptographic Escrow". Offline Cryptographic Escrow is the term used to denote an encrypted, offline stored archive that contains the needed authentication and decryption keys needed to restore the machine, extracted only during an emergency state. In this manner, a new Linux server instance can directly authenticate the Tier-2 Cloud storage location and automate its own recovery process. In doing so, by avoiding the use of local Network Attached Storage (NAS), this system maintains the Zero-Trust boundary of the original backup setup while at the same time reducing any downtime involved via automation. The importance of this paper lies in the fact that it provides a practical guide in conducting a Zero-Trust recovery process by eliminating the complexity of live Public Key Infrastructure (PKI).
This structure was designed to perform a systematic assessment of the suggested architectural model. First of all, the Literature Review section analyzes modern frameworks of bare-metal recovery as well as the drawbacks of current integration of Zero-Trust. The next section, namely Architectural Design, provides an overview of the implementation of the Offline Cryptographic Escrow within multilevel backup. In addition, there is a description of the laboratory setup employed for validation purposes in the Experimental Setup part of the paper followed by RTO Performance Analysis in the Results section.
Literature Review
Considering the increasing sophistication of attacks, especially ransomware and APTs targeting the Linux environment, there is a need for the adoption of a new approach to disaster recovery (DR) plans. Traditional approaches that assumed of trust within the secure network boundary have been deemed insufficient in dealing with attacks that can move laterally, escalate privileges, and steal credentials. As a result, zero trust architecture has been considered by systems administrators to guarantee the resilience of sensitive information. Nevertheless, integrating Zero-Trust Architecture (ZTA) into the recovery stage creates a profound paradox, as security through complete isolation will make it difficult to restore data in case of destruction of the original physical infrastructure. The literature review focuses on the relationships between ZTA, Bare-Metal Recovery (BMR), and cryptography escrow under Linux operating system. The scope of this literature review covers disaster recovery frameworks, secure pull architectures, and the paradox of trusting a system after destruction. Specifically, the review analyses how ZTA principles, as defined by NIST SP 800-207, can be extended beyond active network monitoring to encompass the "cold" state of system restoration. It excludes enterprise appliance ecosystems to maintain a focus on open-source Linux methodologies. The purpose of this review is to critically evaluate existing frameworks for secure recovery and to identify the critical gap regarding ephemeral trust establishment for newly provisioned "clean" machines. The body is organised thematically, beginning with the evolution of DR and resilience theories, followed by an analysis of Secure Pull models, the BMR paradox in Zero Trust, offline cryptographic escrow, and the role of automated Infrastructure as Code (IaC) in cyber survivability.
Evolution of Disaster Recovery Paradigms and Resilience Theories
Disaster recovery conceptual framework has undergone tremendous changes over time, moving from being reactive and centered on backup procedures towards proactive approaches emphasizing cyber-resilience and survivability of systems. Conventional disaster recovery strategies were mainly concerned with the attainment of stringent RTO and RPO objectives using remote site-based storage, data redundancy, and the common "3-2-1" practice . Nevertheless, the rise of crypto-ransomware attacks aimed at disrupting critical Linux systems has prompted academics and industry professionals to concentrate on Zero-Trust Resilience Strategies . Information System Resilience and Survivability Theory underpin this development theoretically, asserting that contemporary systems possess the capability to anticipate accidents, endure hostile environments, survive disruptions, and restore their pre-attack status in case of disruption .
Recent literature increasingly integrates ZTA with resilience engineering to ensure that the core principle of "never trust, always verify" applies not only to active user sessions but also to the highly privileged systems executing the restoration protocols . From a methodological standpoint, studies in this area have shifted from descriptive compliance standards to experimental verification of comprehensive recovery frameworks. As an example, empirical validation of open-source stacks such as Relax-and-Recover (ReaR) and BorgBackup on Red Hat Enterprise Linux has been conducted to attain a reliable BMR while maintaining minimal RTO . Such techniques reflect a wider trend within the industry towards "absolute" or "quantum" zero trust, where trust must constantly be earned and cryptographically validated irrespective of the environment’s location, state, and operational history . However, despite all these structural improvements, an essential analytical pattern emerges indicating that resilience continues to be talked about at the level of the network, with the bare-metal recovery of the system remaining vulnerable to attacks under the zero-trust policy enforcement framework. The major drawback of the existing resilience architecture is its fundamental reliance on the availability of an identity provider or network configuration.
Secure Pull Architectures and Centralised Backup Systems
One key argument surrounding current backup system designs is the nature of data flow and the approach that should be taken to isolate backup databases from potentially infected production environments. The use of traditional "push" schemes where production clients connect to a central backup server and upload the data is very prone to attacks, as these compromised clients have all the authentication required to delete their backups. As a result, modern cybersecurity studies have called for “pull”-based backups where an isolated backup vault reaches out to connect and receive data . This reversal of roles fits the ZTA concept perfectly as the attack surface of the backup vault is actively reduced while any write-access capability is removed from the client.
developed this method into a secure solution in the Linux environment by showing that an “air-gap” can be established using the "Secure Pull" architecture that leverages SSH tunnels and append-only repositories. The researchers' experimental test revealed that such inverted logic prevents the root ransomware attack on the repository while preserving RPO and RTO in logical failures. Based on these concepts, there are some approaches that have been successful in applying the pull approach within larger zero trust networks with the use of vendor-neutral technology with Kubernetes and Ansible Tower for automation of retention and compliance reports . Systems design and risk analysis are used methodologically to support latency considerations within pull strategies .
Nevertheless, even though the Secure Pull design flawlessly guarantees the integrity of the data being backed up in the face of a possible attack, it poses a very serious problem for restoration after physical destruction. In case of a true "bare-metal" recovery where the original hardware is lost due to fire or any other calamity, the standalone backup server would not be able to "pull" itself back. This results in a huge functional gap: The current research thoroughly confirms the integrity of the backup operation itself , however, there is no reliable automation technique for an instantiated, demolished client to securely authenticate itself and "retrieve" its recovery image without putting the main storage at risk of impersonation attacks.
The Bare-Metal Recovery Paradox in Zero Trust
The functional gap ends up at what is termed the “Bare Metal Recovery (BMR) Paradox.” The BMR Paradox captures the essence of the contradiction in ZTA recovery: how does one provide a “root of trust” for a newly instantiated machine that has no prior network identity, no prior secrets, and no prior certificates to allow it to recover its keys? According to NIST Special Publication 800-207, ZTA consists of ideas that attempt to remove all doubt from the decision process; however, it fundamentally depends on the presence of a live PEP and PDP accessible and recognizable to the client .
For a bare-metal restoration operation, the fresh instantiation of the Linux environment would be entirely unknown to the zero-trust network. The common practice for creating root-of-trust structures is highly dependent on hardware-based enclaves, like Trusted Platform Modules (TPMs) or Trusted Execution Environments (TEEs), which offer secure environments for generating cryptographic keys . However, as Doku and Dinda (2025) remark critically, current attestation processes have inherited the same limitations as the hardware solutions they depend on, necessitating live cryptographic secrets, network handshakes, or hard problems that become entirely moot when there is a catastrophic failure of the entire IT system resulting in destruction of the original TPM.
To mitigate this, current research has explored using external "programmers" or physical hardware probes as a root of trust for bare-metal terminals to detect malicious bootloaders during initialization . While effective for protecting boot-level integrity, this physical approach does not solve the digital identity problem required for remote data retrieval. Other frameworks have proposed leveraging TEEs hosted in the public cloud to allow users to recover secrets even when they have lost all local credentials, utilising Universally Composable framework proofs to mathematically guarantee confidentiality . The limitation of this cloud-TEE approach is the reliance on the cloud provider's availability and the flawed assumption that the "clean" machine can reach this external TEE without an existing outbound network identity. The issue here is that there is a clear lacuna in the existing literature in ZTA, whereby a heavy emphasis is placed on active and ongoing systems while neglecting what happens in the case of a “cold” start to the process whereby a system must be verified to access the vault without the benefit of a live Public Key Infrastructure (PKI).
Offline Cryptographic Escrow and Secret Sharing
To mitigate the problems resulting from the application of centralized live Key Management Systems (KMS) solutions in emergency situations, recent research has focused on decentralization through offline cryptographic escrows and secret sharing algorithms. An example frequently cited in literature involves the Personal Data Management System, where the key distribution is done using Shamir’s Secret Sharing algorithm to divide the master keys into mathematical shares given to trustees. In this case, there will be no way through which a hacked component can get back their data unless a level of cooperation is established. In the context of Linux OS recovery, a cryptographic method can be employed to create an encryption key through a Key Management System whose key is not connected to any machine.
Such research commonly uses Design Science Research Methodology (DSRM) to build and evaluate cryptography artifacts. For instance, certain past models have involved the use of spatial random permutation along with stream ciphers for splitting confidential data into bits that are distributed among nodes at large distances and need very particular metadata for recovery . Although these advanced hybrid techniques guarantee theoretical invulnerability, analysis shows that they are usually too resource-intensive, fragile mathematically, or complicated architecturally for quick bare-metal reconstruction of terabyte-level OS volumes. Furthermore, the handling of the reconstruction metadata in these scenarios almost universally assumes the presence of a functioning, uncompromised supervisor server to orchestrate the rebuild . The practical challenge identified in the literature is that offline escrow solutions must be "zero-touch" to minimize human RTO delays, yet highly secure to prevent unauthorised decryption. The use of offline, heavily encrypted asymmetric archives (such as GPG envelopes containing required tokens) presents a viable path forward, yet current research has yet to fully operationalise this concept for automated, direct-from-cloud Linux restoration environments.
Automation and Infrastructure as Code in Cyber Survivability
The final prominent theme in current disaster recovery literature is the operational integration of automation and Infrastructure as Code (IaC) to guarantee "survivability" for mission-critical systems . Modern ZTA implementations for Linux frequently leverage open-source automation engines, such as Ansible, to manage backup scheduling, configuration drift, and audit-grade compliance reporting without human intervention . By treating the entire recovery environment as an ephemeral, code-defined artifact, organisations can theoretically "re-spin" a compromised or destroyed system into a known good state in minutes. This methodology aligns seamlessly with the "clean room" recovery concept, where a machine is provisioned identically from a trusted image and then patched with the latest incremental data backups .
Methodological trends show an increasing reliance on "reproducible builds" and highly verifiable open-sourced OS components to allow for comprehensive cryptographic verification of the restore target . This guarantees that the recovery environment is not somehow secretly manipulated by an ever-present threat. However, what makes this completely automated approach fatally flawed is that the very last step of the security chain – the first mile, if you will – cannot be automated. Sure, infrastructure-as-code can swiftly handle the setup of the bare metal OS, such as Linux. However, for the encrypted application data to be restored, a decryption key must be entered. In cases where that key comes through the same compromised channel through which the initial breach occurred, it poses a significant security risk. Unfortunately, there is no existing literature on how to achieve full automation while recovering keys from a highly-isolated, offline escrow in an absolutely zero trust scenario.
Key findings emphasize the security effectiveness of "pull-oriented" backup architectures for guaranteeing data isolation against ransomware and the potential of TEE-based recovery protocols for protecting cryptographic secrets during transit . Furthermore, major methodological trends indicate an industry-wide move toward open-source IaC automation using tools like ReaR, Ansible, and Kubernetes to achieve mathematically lower RTOs and rigorous organisational compliance .
The current state-of-the-art machine identity standards, including SPIFFE (Secure Production Identity Framework for Everyone) and its implementation SPIRE, offer strong node attestation for running machines. Such solutions require, however, an existing root of trust or at least a functioning OS to run an attestation process. The zt-backup-kit solves the important problem of 'Stage 0', which is present prior to the use of the framework, by supplying the necessary bootstrap identity in order to start operating on the machine according to the machine identity standards.
Though significant progress has been made, "hardware dependency" and "live network dependency" still exist with the trust roots anchored on specific hardware TPMs or existing live network identities which fail due to the disaster . The main research gap is the "Zero-Trust Recovery Gap." While ZTA takes care of active-known subjects in the live network, there are few references as to how it deals with the recovery phase where the destroyed "clean" machine needs to build initial trust for itself. Existing theoretical frameworks either depend on live possibly compromised PKIs, have too much dependence on manual intervention by administrators affecting the RTO or use hardware-rooted keys which lack portability in scenarios of total site loss. This shows that current Zero Trust literature lacks adequate coverage of live network and leaves a big gap when it comes to recovery of a destroyed clean machine through Bare-Metal Recovery without depending on the live network PKIs, vault or manual interventions.
Methodology
In particular, the main goal of this research was to assess the effectiveness and End-to-End RTO of an automatic decentralized bare-metal recovery process in a Linux environment. To that end, aimed at checking whether an offline encryption-based escrow, which is a secure archive ztbk-credentials.tar.gz.gpg coupled with an automation script “emergency-restore.sh,” would enable us to boot up a freshly built Linux server using data from a Tier-2 cloud storage system. An important criterion in realizing this objective was for the restoration process not to depend on an operational PKI and not to expose the primary local Network Attached Storage (NAS) to the newly created, untrusted system.
This was accomplished by incorporating a quantitative research paradigm into the evaluation stage of the experiment, which was embedded in the wider context of the Design Science Research Methodology (DSRM). Although DSRM informed the design and development stages of the cryptographic escrow construct , a quantitative approach was considered the most suitable choice for the evaluation stage. Quantitative research methods are intended to identify objective and precise measurements of variables. For example, RTO stands for recovery time objective, which is the time it takes for the system to recover from being completely down to full operation status. Using quantitative methods, a timeline was established along with system logs to evaluate whether the new automated process worked better than the old manual one.
Research Design
The research in question employed an experimentally designed research methodology in assessing the suggested recovery framework. Under the field of cybersecurity and systems engineering, experimental design affords the required level of rigor needed for manipulating certain independent variables (automation versus manual restoration technique) and controlling any other extraneous environmental conditions (such as variance within the network and hardware constraints). This consistency proved to be very helpful in determining the End-to-End RTO of the process of automation and assessing the effectiveness of both processes. The level of inconsistency was minimized by ensuring that all elements except restoration would remain the same. This was because the main goal of the experiment was to show that any drop in the value of RTO would be caused only by the restoration process using cryptography. The complete structural transition of the recovery framework, from initial access denial to memory cleanup, is systematically stacked and aligned across three operational boundaries (see Figure 1). This vertical separation ensures that unauthenticated nodes at Stage 0 never interact directly with isolated data vaults

Source: Compiled by author based on experimental design metrics and the NIST SP 800-207 Zero-Trust Framework.
Participants and Sampling
Contrary to the normal procedure followed in conducting social sciences research studies, the participants, or the unit of analysis in these engineering systems studies, were basically the virtualized Linux server operating systems and the target cloud storages. The sample size of the experimental process was decided on sixty recovery sessions overall, with thirty recovery sessions allocated per test branch as follows: n=60 recovery sessions that adopted the automated escrow system restore process, and n=60 recovery sessions following the conventional manual restore process .
A purposive sampling method was selected to determine the technological parameters of the testbed. Purposive sampling is highly effective in engineering research when the study demands specific, representative technological standards that reflect current industry’s best practices rather than a randomised assortment of outdated systems . As a result, rigorous criteria for inclusion and exclusion were developed. The test environment should have been set up using Ubuntu 22.04 LTS as the operating system since it is one of the most common standards in enterprise and Small to Medium Enterprise (SME) server settings. In addition, Tier-2 cloud service providers, namely Google Drive and AWS S3, were to be used as the sole inclusion criterion as they were meant to reflect the behavior of the data backup practices utilized by modern companies. Hardware-based recovery was not within the scope of this research; therefore, the process of data extraction from burnt or damaged hard disk drives was not considered.
Data Collection Methods
Data acquisition has been strictly confined to the boundaries of a controlled lab experiment, employing system and application logs as quantitative tools for data acquisition. The lab has been set up using the VirtualBox/KVM hypervisor that ran the Ubuntu 22.04 LTS virtual machines. The recovery architecture has been set up using an open-source software package called the zt-backup-kit , This open-source package includes the emergency-restore.sh executable. The open-source binaries used to handle data transfer and cryptograph include Restic, responsible for managing the deduplicated repository, Rclone for connecting with the cloud target storage, and GnuPG for decrypting the offline escrow envelope. Standard Linux utilities for measuring time such as systemd-journald and bash time have been utilized. Automated backup process is carried out using the emergency-restore.sh script that uses a particular workflow for decrypting the escrow and setting up the recovery process (Figure 1).

Source: Developed by Author, 2026.
To begin with, a baseline virtual machine was configured with a 10GB dataset, and a secured copy was backed up into the cloud storage servers. Following this, a destructive event was simulated through the erasure of the virtual machine. During manual tests, all the steps for configurations and credentials fetching were done manually. During automated tests, the administrator would input the archive file "ztbk-credentials.tar.gz.gpg" and the passphrase for its decryption to the "emergency-restore.sh" script.
Ethical Considerations
While the main technical analysis involved simulations of the infrastructure and artificial data that did not include any human participants or Personally Identifiable Information (PII), manual recovery analysis required human system administrators to perform the control actions. For this reason, written consent was taken from all participants before they took part in the timing tests. Consent forms included information regarding the type of timing test that would take place, with an explanation that participants’ skills were not going to be evaluated, only the efficiency of the process.
Participants were fully briefed on their rights, including the absolute right to withdraw from the timing exercises at any point without penalty or professional repercussion. As the experiment consisted of IT activities, used only artificial data about the organisation, and did not pose any physical or psychological harm to the subjects, the research was categorised as exempted by the the institutional review board (IRB) for further ethical assessment. In order to maintain complete privacy, timing information was de-identified, and system logs showing the manual loops had no user information or IP addresses. All aggregated performance metrics and log files were stored on an encrypted, locally hosted university server, subject to a stringent data retention policy mandating the secure deletion of all raw logs five years post-publication.
Limitations and Justification
Like all simulations, there were certain pre-known limitations within the methodology. Firstly, the limitation was regarding the simulation of the Wide Area Network (WAN). Although there were methods for imposing artificial limits of bandwidth and latency for simulating the retrieval process from cloud storage through the use of certain network conditioning tools, the WAN environment simulation may not have truly captured the uncertainties such as routing latencies, packet losses, or internet congestion at large scale due to the local crisis that a firm could face. Secondly, the technological artifact was only tested on Debian architecture, namely Ubuntu 22.04 LTS. Thus, the immediacy of generalizing the results from the empirical analysis would be constrained for operating systems based on other package managers/kernel structure like Red Hat or Arch.
To counter these difficulties, certain mitigating techniques have been explicitly built within the methodology itself. For tackling the restrictions imposed by the architecture, extensive use was made of cross-platform and compiled open-source binaries like Restic and Rclone, which are written in Golang. Since these binaries depend very little on system-specific libraries that are embedded into the operating system, the cryptographic escrow and the automation scripts used retain their high degree of generalizability . Additionally, the modification of the normal procedure for conducting manual recovery operations in the context of a laboratory setting that is well-controlled and timed could have resulted in the development of a Hawthorne effect, whereby the administrators carried out the manual tasks at a faster rate than when not under observation during an actual emergency situation. Nonetheless, this modification in methodology was necessary because the creation of a benchmark for the fastest manual recovery operation ensured more valid results; hence, proving the superiority of the automated escrow process.
Results and Discussion
Consequently, the evaluation of the decentralised bare metal recovery process offered some quantitative information regarding the efficiency and effectiveness of using the offline cryptographic escrow for such processes. Using a simulation scenario where a complete system wipe was initiated followed by the restoration of the system manually and automatically allowed for obtaining comparative information about RTO and data integrity in sixty trials. The following paragraphs contain some empirical data that will be interpreted later from the perspective of ZTA.
Analysis of Recovery Time Objectives and Operational Efficacy
Primary measurement in this case was End-to-End RTO. Descriptive statistics are presented in Table 1. Restoration procedure carried out automatically with the use of zt-backup-kit , was done in a significantly shorter time compared to the manual restoration procedure. Automated emergency-restore.sh showed an average RTO of 4.2 minutes (SD=0.38) while manually configured solution required 22.4 minutes (SD=2.15). A granular analysis of the 4.2-minute average RTO reveals that the workflow is highly optimized for cryptographic overhead and data throughput. Approximately 30 seconds were dedicated to the GnuPG decryption of the escrow archive and credential provisioning, while the remaining 3.7 minutes were utilized for the deduplicated data streaming and final file-system verification. This suggests that the primary bottleneck remains network throughput rather than administrative configuration.
| Restoration Method | n | Mean (M) | Median | SD | Min | Max |
| Automated Escrow | 30 | 4.2 | 4.1 | 0.38 | 3.8 | 5.2 |
| Manual Baseline | 30 | 22.4 | 22.1 | 2.15 | 19.5 | 28.4 |
Source: Develop by Author, 2026.
The results reveal that automation eliminates human-made bottlenecks to the tune of 81%, enabling a pristine system to bootstrap itself from a “cold” position with little administrative intervention. What was surprising was the 12% increase in RTO using Google Drive versus AWS S3, probably because of the extra time needed for API handshakes in Google’s OAuth 2.0 protocol.
Interpretation of Ephemeral Trust and the BMR Paradox
The results prove that the "Offline Cryptographic Escrow" model effectively resolves the Bare-Metal Recovery (BMR) paradox. Historically, a newly provisioned machine with no pre-existing identity could not access a Zero-Trust vault without manual credential injection. This study demonstrates that the zt-backup-kit enables a "clean" machine to establish Ephemeral Trust, a temporary, verifiable identity-using the GPG-encrypted escrow archive to bootstrap its credentials independently. This confirms that a live Policy Decision Point (PDP) is not an absolute requirement for the restoration of a Zero-Trust system, provided the trust is cryptographically escrowed in an offline state.
The above analysis holds true to the high-performance, locally isolated model framework proposed by , where the emphasis was placed on the importance of the directionality and isolation in networks, being a critical part of network security architecture. At the same time, the above analysis goes against the views of Miller and Davis (2025), where Zero Trust Recovery necessarily entails the presence of a living PDP to govern the process of recovery. Through offline escrow, this paper shows that a Zero-Trust system's trust can be pre-authorized and stored 'cold.'
Operational Stability and Storage Efficiency
After the experiment, the zt-backup-kit was installed into a production environment for testing its reliability over time. The status report produced on May 6, 2026, reveals a 100% success rate from the ten consecutive days' snapshots. Besides, the artifact showed considerable storage efficiency.
| Metric | Value |
| Total Snapshots | 10 |
| Logical Data Size | 1.6GB |
| Actual Physical Storage | 1003MB |
| Deduplication Ratio | 1.61 × times (38.0% Savings) |
Source: Production Server (fssl-susl), 2026.
Security Resilience and Data Integrity
The security position in terms of “Secure Pull” segregation did not suffer any damage within the duration of the experiment. The newly spun-up machines, which were connected to the Tier-2 cloud-based repository servers, showed no network connection to the local NAS used as the primary vault. In addition, there were no cases of any data loss since the results obtained showed a success rate of 100% when restoring data integrity. There was no data loss within the 30 automated cycles, as seen from Restic’s cryptographic checksums.
| Threat Category | Potential Risk | zt-backup-kit Mitigation |
| Tampering | Hacker modifies the backup archive in the cloud. | Restic uses cryptographic content-addressable storage; any change to the data breaks the hash and fails verification. |
| Info Disclosure | Someone steals the Escrow archive (ztbk-credentials.tar.gz.gpg). | The archive is GPG-encrypted. Without the physical passphrase known only to the admin, the file is useless. |
| Spoofing | A rogue server tries to "pull" the backups. | Access is restricted via Ephemeral Trust; the cloud repository requires specific tokens only found inside the decrypted Escrow. |
Source: Compiled by author based on the STRIDE Threat Model
Unexpected Results and Performance Anomalies
Another interesting observation made during the testing was an RTO of 12% increase by using Google Drive as a Tier-2 storage destination against AWS S3. Although all tests succeeded, the average time taken by the iterations involving Google Drive was 4.7 minutes, while those using AWS S3 took 3.9 minutes on average. The increased latency can be explained by the fact that the process of establishing an API connection with Google Drive takes more time than connecting with AWS S3, due to the more complicated OAuth 2.0 mechanism used by Google as opposed to the signature version 4 in Restic and Rclone.
Theoretical and Practical Implications
The results of this study hold great importance to the fields of systems administration and disaster recovery. Firstly, in a practical sense, the results provide an affordable, non-vendor solution to implement ransomware readiness at a business-class level by using open-source software packages. In a theoretical sense, this research paper helps address an important void in NIST SP 800-207, which emphasises the management of subjects that have been actively identified in a live environment. Defining a protocol to trust a newly provisioned machine leads to a definition of Day Zero security.
Implications for Institutional Cyber-Resilience
The successful ten-day operational run, along with the deduplication factor of 1.61×times, hold important practical ramifications. Even though the experimental run has shown that the Offline Cryptographic Escrow resolves the problem posed by the BMR paradox, the results from the production run point to the sustainability of this mechanism. Through automation of the backup cycle using Cron schedule (seen in 30 2 * * *) in the zt-backup-kit, it becomes unnecessary for there to be any intervention on the part of an administrator in maintaining the “Secure Pull” architecture against any security drift caused by humans.
Limitations and Future Research
Simulation of WAN latency was performed within the experiment, and actual outages may extend image fetching time beyond what is measured. Future research should explore the integration of hardware-based roots of trust, such as TPM 2.0, and expanding scripts to perform "Intelligent Anomaly Detection" during the pull process to reject tampered images , The trade-off between Zero-Trust security and Bare-Metal Recovery speed is a solvable challenge. By re-architecting the restoration relationship from a live-dependency model to an offline cryptographic escrow model, Linux environments can achieve the high-speed recovery required for business continuity without breaching strict isolation boundaries. The zt-backup-kit provides a viable, automated standard for modern disaster recovery, proving that a decentralised "Secure Pull" architecture can survive total physical infrastructure loss while delivering a nearly 80% improvement in restoration speed over manual baseline protocols.
Conclusion
A well-executed disaster recovery operation in the Zero Trust Architecture has proven to be among the toughest challenges in modern systems management. In this work, we endeavored to solve the "Bare-Metal Recovery Paradox," which is defined by the fundamental paradox of having to provide a machine, which lacks identity at the time of provisioning, secure access to an extremely isolated backup vault to restore its operation. Through the development and testing of an automated process based on the Offline Cryptographic Escrow and the zt-backup-kit (Gomas, 2026), we were able to prove that rapid, safe restoration can indeed take place in a fully-isolated environment without a PDP.
The quantitative analysis conducted on the new artifact throughout sixty recoveries (n=60) proved the superiority of the new artifact in operation. The automation process managed to achieve the RTO of only 4.2 minutes for the restoration period, which amounts to about 81% less downtime as compared to the manual restoration process where the RTO stood at 22.4 minutes. However, besides time performance, the analysis also showed that the "Secure Pull" design had 100% data integrity success rate.
Additionally, the extended monitoring of the system operation at the Faculty of Social Sciences and Languages (FSSL-SUSL portal) level confirmed its sustainability in the institution. As seen from the production logs, there was a constant 100% success rate per day cycle and an outstanding deduplication ratio of 1.61×times, reducing the amount of required physical storage space by 38%. Therefore, these results have proven that not only is the proposed model of decentralised data recovery more efficient, but it is also optimised to consume fewer resources.
Research Contributions and Significance
The main innovation offered by this paper is the practical implementation of Ephemeral Trust for "Day Zero" recovery. Leveraging the GPG encrypted offline escrow, this paper presents a feasible solution that can allow clean-slate systems to establish their identity without relying on any PKI infrastructure. The research addresses one of the critical architectural gaps of the NIST SP 800-207 approach, since it does not have measures for dealing with the first connection between non-configured devices in recovery from disaster conditions.
zt-backup-kit is a vendor-independent, open-source solution for SMEs and governmental organizations. The approach democratizes enterprise-level resilience by demonstrating that good ransomware resistance and swift bare metal recovery can be achieved without the need to invest in proprietary systems of appliances. With the separation of the recovery pathway from the local network and reliance on Tier-2 cloud destinations, this research builds a solid layer of defense in depth, providing survivability of institutional information under any circumstances.
Limitations and Reflective Analysis
Nevertheless, there are some methodological limitations that should be taken into account in terms of the efficiency of the artifact in question. First, WAN latency simulation was used for conducting experiments since it could only be conditionally configured to mimic the reality in case of extremely volatile global internet traffic associated with regional catastrophes. Secondly, most of the tests were carried out on architecture based on Debian (e.g., Ubuntu 22.04 LTS). Though Golang-based cross-platform programs like Restic and Rclone can be run on virtually any operating system without losing any significant features, RTO values might differ depending on specific kernel versions or other software configurations. Finally, there is still a need for a human administrator to manually enter a passphrase in the case of an “offline escrow.”
Recommendations for Practice and Future Research
As per the results of this research paper, it is suggested that all organisations should move towards "pull-based" backups over the existing "push-based" system. Adopt the Secure Pull architecture along with offline credential escrows. In the context of practice, use of zt-backup-kit combined with Cron can be suggested.
The direction for future work should be towards minimising the necessity for "human in the loop" and towards investigating hardware implementations of roots of trust in the decryption process, like TPM 2.0 (Trusted Platform Module) or secure enclaves. Another interesting avenue for improving upon the current iteration of the tool would be implementing an Intelligent Anomaly Detection capability as part of the emergency-restore.sh (Gomas, 2026). This can be achieved through analysis of inbound network traffic via entropics and the use of machine learning algorithms to detect possible anomalies which could indicate presence of dormant ransomware attacks or changes in data.
With increasing cyber-attacks on cybersecurity solutions, it is paramount to implement the Zero Trust recovery strategy. As shown above in the course of research, the notion of compromising efficiency while ensuring the safety of operations is wrong. The application of offline escrow cryptography along with automated procedures can help build such a recovery ecosystem, which would not be vulnerable to lateral attacks but would also have a capacity to restore essential services within seconds. The study shows the ways of building the new generation of Linux-based systems that are secure and fast enough to withstand all kinds of cyber-attacks, even leading to complete destruction of hardware.
References
- Adei, D., Orsini, C., Scafuro, A., & Verber, T. (2025). How to Recover a Cryptographic Secret From the Cloud. CCS 2025 - Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, 1, 1814–1828. DOI ↗ Google Scholar ↗
- Ahn, G., Jang, J., Choi, S., & Shin, D. (2024). Research on Improving Cyber Resilience by Integrating the Zero Trust Security Model With the MITRE ATT&CK Matrix. IEEE Access, 12, 89291–89309. DOI ↗ Google Scholar ↗
- Anand, C. S., & Shanker, R. (2023). Zero Trust Resilience Strategy for Linux Crypto Ransomware Obviation and Recuperation. 2023 3rd International Conference on Intelligent Technologies, CONIT 2023. DOI ↗ Google Scholar ↗
- Anciaux, N., Bonnet, P., Bouganim, L., Nguyen, B., Pucheral, P., Sandu Popa, I., & Scerri, G. (2019). Personal Data Management Systems: The security and functionality standpoint. Information Systems, 80, 13–35. DOI ↗ Google Scholar ↗
- Bogdanov, T., & Chivarov, N. (2026). Deployment and Verification of an automated backup solution for Zscaler Private Access (ZPA). 494–499. DOI ↗ Google Scholar ↗
- Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. DOI ↗ Google Scholar ↗
- Dilan Gomas, A. S., & Rathnayake, R. M. N. B. (2026). Optimizing Recovery Objectives (RTO and RPO) in Secure Linux NAS Environments: A Design Science Approach to Ransomware Resilience. Asian Journal of Social Science and Management Technology, 8(1), 82–94. DOI ↗ Google Scholar ↗
- Doku, F., & Dinda, P. (2025). TRUSTCHECKPOINTS: Time Betrays Malware for Unconditional Software Root of Trust. DOI ↗ Google Scholar ↗
- Etikan, I. (2016). Comparison of Convenience Sampling and Purposive Sampling. American Journal of Theoretical and Applied Statistics, 5(1), 1. DOI ↗ Google Scholar ↗
- Gomas, A. S. D. (2026). zt-backup-kit: Zero-Trust Backup & Restore Kit (V1). Zenodo. DOI ↗ Google Scholar ↗
- Peffers, K., Tuunanen, T., Rothenberger, M. A., & Chatterjee, S. (2007). A Design Science Research Methodology for Information Systems Research. Journal of Management Information Systems, 24(3), 45–77. DOI ↗ Google Scholar ↗
- Raheman, F., & Raheman, F. (2024). From Standard Policy-Based Zero Trust to Absolute Zero Trust (AZT): A Quantum Leap to Q-Day Security. Journal of Computer and Communications, 12(3), 252–282. DOI ↗ Google Scholar ↗
- Rose, S., Borchert, O., Mitchell, S., & Connelly, S. (2020). Zero Trust Architecture. DOI ↗ Google Scholar ↗
- Ross, R., Pillitteri, V., Graubart, R., Bodeau, D., & McQuaid, R. (2021). Developing cyber-resilient systems : DOI ↗ Google Scholar ↗
- Sokolov, S. S., Lauta, O. S., Mitrofanov, M. V., Kurakin, A. S., & Kramskoy, N. N. (2026). A Comprehensive Approach to Ensuring Business Continuity Based on Centralized Data Backup Systems. Proceedings of Telecommunication Universities, 12(1), 16–25. DOI ↗ Google Scholar ↗
- Ueno, Y., Miyaho, N., & Suzuki, S. (2009a). Disaster recovery mechanism using widely distributed networking and secure metadata handling technology. Proc. 4th Edition of the UPGRADE-CN Workshop on Use of P2P, GRID and Agents for the Development of Content Netw., UPGRADE-CN’09, Co-Located Int. Symp. High Perform. Distrib. Comput. Conf., HPDC’09, 45–48. DOI ↗ Google Scholar ↗
- Ueno, Y., Miyaho, N., & Suzuki, S. (2009b). Disaster recovery mechanism using widely distributed networking and secure metadata handling technology. Proc. 4th Edition of the UPGRADE-CN Workshop on Use of P2P, GRID and Agents for the Development of Content Netw., UPGRADE-CN’09, Co-Located Int. Symp. High Perform. Distrib. Comput. Conf., HPDC’09, 45–48. DOI ↗ Google Scholar ↗
- Urien, P. (2019). Integrity probe: Using programmer as root of trust for bare metal blockchain crypto terminal. invited paper. 2019 5th International Conference on Mobile and Secure Services, MOBISECSERV 2019. DOI ↗ Google Scholar ↗
- Vai, M., Whelihan, D., Simpson, E., Kava, D., Lee, A., Nguyen, H., Hughes, J., Torres, G., Lim, J., Nahill, B., Khazan, R., & Schneider, F. (2023). Zero Trust Architecture Approach for Developing Mission Critical Embedded Systems. 2023 IEEE High Performance Extreme Computing Conference, HPEC 2023. DOI ↗ Google Scholar ↗
- VOINILĂ, A.-C., & NEDELCU, A.-S. (2025). Standards and Best Practices in Disaster Recovery. International Conference of Management and Industrial Engineering (ICMIE 2025). Agility and Readiness for Sustainable Business Continuity, 659–664. DOI ↗ Google Scholar ↗
- Zhou, S., Network, P., Wang, K., & Yin, H. (2025). Dstack: A Zero Trust Framework for Confidential Containers. DOI ↗ Google Scholar ↗
- Zuo, J., Guo, Z., Gan, J., & Lu, Y. (2021). Enhancing Continuous Service of Information Systems Based on Cyber Resilience. Proceedings - 2021 IEEE 6th International Conference on Data Science in Cyberspace, DSC 2021, 535–542. DOI ↗ Google Scholar ↗