As Snowflake users know, even if you follow best practices for storing your data in a data warehouse or data lake, perennial challenges with security and governance are complicated by data privacy regulations that get more rigorous every year. And data silos just complicate matters. That’s why you’re using Snowflake!
For example, companies that do business in the European Union must adhere to exacting data lineage and traceability requirements to comply with the EU General Data Protection Regulation (GDPR) requirements. Similar regulations have come into effect in California with the California Consumer Privacy Act (CCPA). Industry-specific mandates, such as the Health Insurance Portability and Accountability Act (HIPAA) in healthcare, the Payment Card Industry Data Security Standard (PCI DSS) in ecommerce, and the Sarbanes-Oxley Act (SOX) in finance, further complicate security and governance.
Data Governance in a Self-service Data Marketplace
As we mentioned in our previous blog, business analysts are wholeheartedly in favor of a self-service data marketplace. IT, on the other hand, has reservations — primarily around governance and security. To avoid what Gartner analyst Rita Sallam refers to as the “Wild West of Data,” it’s essential that any self-service data environment have a governance and security framework. Besides, a closer look at the stakeholders in such an environment reveals quite a few personas whose priorities need to be considered. They include:
- Data engineer
- Data steward
- Data analyst
- Business analyst
- Data scientist
- Data subject matter expert (SME)
Boomi DCP Accommodates Multiple User Personas
For Snowflake users, Boomi Data Catalog and Preparation (DCP) accommodates multiple user personas by default. In addition to pre-set access privileges, customized access rights can be created and saved by the data steward. DCP supports Kerberos and Active Directory, which facilitates import of access privileges by individual user or group. For example, if a new data source is added that benefits marketing users, everyone in that group can immediately be granted access.
Any data attribute in any data set can be selected as a function to mask data at the row or column level. This allows a user to see personally identifiable information (PII) on some records but not others, which is especially important for compliance with regulations such as the GDPR.
The Value of Collaboration: Shared Insights Reduce Costs and Time
The key benefit of collaboration tools in data management and data science is support for shared learning. Collaboration is an essential part of the Boomi AtomSphere Platform and helps Boomi deliver unmatched value in knowledge sharing and accelerated time to insight.
Often a business analyst working on a specific data project in support of a business goal would find the data the project required, cleanse and transform that data, and send the results to the requestor for visualization. Much of the knowledge required to perform these tasks would reside with that business analyst.
When the task of data discovery and prep is opened up to less technical users, the knowledge required to perform these tasks becomes even more important. The collaboration features of the Boomi platform substantially alleviate institutional knowledge lock-in by exposing all users to the knowledge of others — from metadata descriptions to shared transformation jobs and workflow automation — every aspect of the data pipeline. This shared learning substantially reduces time, cost, and effort while leading to much faster insights.
Crowdsourced Data Quality
Another key benefit of collaboration within a data platform is realized by the application of shared knowledge to data quality. From data accuracy to metadata descriptions, this information is valuable to every user, and the more users the more the value increases.
Users can rank data on a 5-star scale and share their perspectives on how helpful a dataset or metadata description is. For example, if a user has determined that “Initiate Transport EXT_!” as an attribute heading actually corresponds to the “Play” function on a streaming media data set, every user benefits from that insight.
Moreover, if users have questions about a particular dataset, they can initiate a chat with one another, or a subject matter expert listed in Boomi DCP’s business glossary.