Terraform state file recovery is another critical DevOps interview question. Here's the answer that will impress the interviewers.
Before we get to the answer, let me clear your concepts
👉 What happens if your state file is deleted or corrupted?
You lose Terraform's memory of your infrastructure.
Terraform no longer knows what resources it created or manages.
You can't run terraform plan or terraform apply without errors. Your infrastructure still exists in the cloud, but Terraform can't see it anymore.
This is a disaster scenario. Without a state, Terraform will try to recreate everything, causing conflicts and potential downtime.
👉 How do you prevent state file loss?
✓ Use remote state with S3
✓ Store state in S3 bucket with versioning enabled. It keeps multiple versions of your state file. Can restore previous versions if the current one gets corrupted.
✓ Schedule regular backups of your state file to separate storage. Use lifecycle policies to retain backups for 30-90 days.
But this doesn't solve the problem completely. Even with versioning and backups, you can lose data.
→ If your state file gets corrupted today and you restore yesterday's backup, any resources created between yesterday and today are missing from the restored state.
→ Same issue with S3 versioning. The previous version might be from 2 hours ago, and you created 5 new resources in those 2 hours.
You'll have a gap. The restored state won't know about those newer resources.
👉 How do you recover from a deleted/corrupted state?
Step 1: Check for backupsRestore from S3 versioning: aws s3api list-object-versions. Restore from automated backup if available.
Step 2: Import missing resources manually.
For each resource Terraform no longer knows about, use terraform import
Example:
terraform import aws_instance.web i-1234567890abcdef0
terraform import aws_s3_bucket.data my-bucket-name
This tells Terraform: "This resource exists, add it to your state."
Step 3: Verify with Terraform plan
Run Terraform plan to check for differences.
✓ If it shows no changes, your state has been recovered correctly.
✗ If it wants to recreate or modify things, you missed some imports.
Step 4: Recreate state from scratch (worst case)If backups are too old or missing, you rebuild the entire state file.
Go through every resource in your cloud console. Import each one into Terraform manually. This is painful but sometimes necessary.
👉 Best practices for recovery:
• Keep your Terraform code in Git so you know exactly what resources should exist.
• Document resource IDs in comments or separate files for easier importing.
• Use terraform state list on a good state file to see all managed resources.
• Test your backup restoration process regularly, don't wait for a disaster.
• Consider using Terraform Cloud or Spacelift; they handle state management and backups automatically.
✅ Best answer in an interview:
|_ I prevent state file loss by using S3 remote backend with versioning enabled and automated backups.
|_ However, even with these safeguards, there can be gaps between the latest backup and the current state.
|_ If the state is lost or corrupted, I first restore from S3 versioning or backup, then use terraform import to manually add any missing resources that were created after the backup.
|_ I verify recovery with terraform plan to ensure no unexpected changes.
|_ Prevention is key, so I also maintain strict access controls and state locking to minimize corruption risks.
That's it.
Shows you understand both prevention and real-world recovery scenarios.