2 min read

Shadow Deployment of GenAI Applications.

GenAI deployments demand thorough testing and monitoring lest we run into CX problems. This article highlights the skills and knowledge to safely deploy and monitor generative AI systems, reducing risk while gathering actionable data before going live.
💡
In a recent client engagement, we had to deploy a newer GenAI model along side existing models, without disrupting the system or the user's experience. This article is a summary of the thoughts and ideas that went behind executing a successful production launch, all the while keeping status-quo for the existing model infrastructure.

Shadow Deployment is a powerful technique, especially for testing GenAI in production environments, in parallel with other models/existing systems and without impacting the end user experience. The icing is walking away with real data points about the interaction and nuances of deploying the GenAI product into the larger tech stack.

But why is Shadow Deployment required at all?

Shadow Deployment provides multiple benefits for GenAI developers.

  • It helps mitigate risks by testing new models alongside existing ones, allowing us to know if there are any disruptions between them.
  • Shows potential issues in transitioning to the new model.
  • Let's us examine the scalability and performance of the new model.

How is Shadow Deployment achieved?

  • Using Feature Flags.
  • Through traffic splitting between the old and new models.
  • By data mirroring, which replicates data from production to the shadow systems. This way both systems are 'data-identical'.

The tools we used during Shadow Deployment.

  • Kubernetes:
    • Created a deployment for the new model and redirected some traffic in their direction.
    • Used logging and audit trails to measure the performance of the new model.
  • Feature-Flagging Services:
    • Launch Darkly (or other feature-flagging tools) can be used to turn the new model on/off.
  • Chaos Engineering Tools:
    • Used these tools for discovering areas of vulnerability and high latency for the new model. Using such tools showed us how resilient the model was.

Things to look at while comparing the old vs new model(s).

  • Compare the load on the system with the old and new model(s).
  • Measure user interaction rates between the old and new to assess how much more popular the new is vs the old (or vice versa).
  • Tracking error rates in the model can be done using an 'audit-by-design' architecture.
  • Finally, measure how effectively the new model was able to impact key business KPIs (like conversion rates for potential buyers turning to guaranteed buyers).

I write to remember, and if, in the process, I can help someone learn about Containers, Orchestration (Docker Compose, Kubernetes), GitOps, DevSecOps, VR/AR, Architecture, and Data Management, that is just icing on the cake.