Data is the lifeblood of modern organizations, flowing between systems, databases, and applications. Keeping track of where sensitive data is processed, stored, and transmitted is critical for security and compliance. The Data Flow Documentation control from the Cloud Security Alliance Cloud Controls Matrix provides guidance on creating and maintaining documentation to identify these data flows.
Where did this come from?
This control comes from the CSA Cloud Controls Matrix v4.0.10 - 2023-09-26. You can download the full matrix here. The matrix provides a comprehensive set of controls mapped to various industry standards to help organizations secure their cloud environments. For more background, check out this overview of the CCM.
Who should care?
Several roles should pay close attention to this control:
- Data Privacy Officers with responsibility for safeguarding sensitive data
- Security Architects with a mandate to design secure data flows
- Compliance Managers with a need to demonstrate data governance to auditors
- Application Developers with a role in implementing data flow controls
What is the risk?
Without proper data flow documentation, an organization risks:
- Sensitive data leaking to unauthorized locations
- Inability to track down data for GDPR data subject requests
- Failing compliance audits due to lack of data governance evidence
- Spiraling costs from ungoverned shadow data stores
While documentation alone doesn't prevent these risks, it is a foundational capability that enables detection and response.
What's the care factor?
For organizations that handle personal data, financial records, or other regulated information, the care factor for this control should be high. Regulators are increasingly focused on how companies safeguard data. Poor documentation is often a leading indicator of broader security failings.
However, for a startup with a single standalone application, extensive data flow documentation may be overkill. A pragmatic, risk-based approach is needed.
When is it relevant?
Data flow documentation makes sense when:
- Building complex multi-tiered applications
- Integrating multiple third-party services
- Enabling self-service analytics and BI platforms
- Performing a data protection impact assessment
It may be less relevant for:
- Isolated sandbox environments
- Static brochure-ware websites
- Appliance-based deployments with no external data flows
What are the trade offs?
Maintaining accurate data flow documentation requires significant time and effort. Application changes may require updates to multiple documents. Over-specification can lead to voluminous artifact libraries that swamp productivity.
Highly dynamic auto-scaling environments can be challenging to document, as instances spin up and down in response to load. Tracing data across third-party APIs and SaaS services is complex.
However, the long-term benefits typically outweigh the costs. Clear documentation facilitates security reviews, expedites troubleshooting, and provides auditable evidence of compliance.
How to make it happen?
- Establish a standard template for data flow diagrams and documentation
- Identify authoritative data sources (databases, data lakes, SaaS platforms)
- Interview application owners to identify key data ingress and egress points
- Document data flows between components using architecture diagrams
- Specify data security classifications for each data store and flow
- Note security controls applied to each flow (encryption, access control, logging)
- Record data retention periods and archival processes
- Identify third-party data sharing arrangements and document in DPIAs
- Establish a regular review and sign-off process with application owners
- Automate documentation where possible, e.g. AWS Config, GCP Asset Inventory
What are some gotchas?
- Ensure documentation follows data, not just systems. A single server may have multiple data flows.
- Remember to document offline/manual data flows, not just automated transfers
- AWS: Flows between VPCs in peered or transit gateway configurations are often missed
- GCP: Data flows associated with BigQuery ALLDATA roles easily overlooked
- Azure: Implicit data flows when granting unstructured Blob Store access
What are the alternatives?
Some promising approaches are emerging to automate data flow mapping:
However, these generally focus on structured data stores, not application data flows. A holistic approach blending documentation, interviews, and automation yields the best results.
Explore further
- Review adjacent controls DSP-06 (Data Inventory and Mapping) and DSP-08 (Sensitive Data Classification)
- Implement complementary controls from the CIS Benchmarks, e.g. CIS AWS Foundations Benchmark Control 1.16 (Inventory Data Assets)
- Consider using open source templates such as Data Flow Diagram template
?