Actionable Data Intelligence with Peggy Tsai, Chief Data Officer at BigID

Data-Driven Podcast

Listen to Chief Data Officer at BigID, Peggy Tsai, talk about the basics like data governance, management, quality, and how unstructured data has created a challenge with ways to solve it. She then digs into newer concepts and approaches like data mesh and building data products.

See All Podcasts Button Arrow

Meet our Guests

Peggy TSAI Headshot, Cdo, Bigid
Peggy Tsai

Chief Data Officer at BigID

Peggy Tsai is the Chief Data Officer at BigID where she is responsible for building the data strategy and enablement of data governance capabilities for customers. BigID enables organizations to know their enterprise data and take action for privacy, protection, and perspective. With BigID, customers can proactively discover, manage, and protect their regulated and sensitive data across their data landscape. Peggy has over 18 years of practitioner experience in data management, stewardship and governance in the financial services industry. Prior to joining BigID, she was Vice President of Data & Analytics at Morgan Stanley where she helped run the data governance program across the Wealth Management division. She held various positions at Morgan Stanley where she supported the data science teams on analytical data governance and led a project team to document data lineage and business definitions across enterprise systems in order to comply with Basel regulations. Peggy was also Data Innovation Lead in the Enterprise Data Management group at AIG. She was responsible for implementing enterprise data management practices to support Anti-Money Laundering, Solvency II and GDPR in the Latin American region and Commercial line of business. Peggy also worked at S&P Global Ratings where she held various positions in enterprise data group and technology in order to drive the value of data between the business and IT.  Peggy has a Masters in Information Systems from New York University and a Bachelors of Arts in Economics from Cornell University. She is an adjunct faculty at Carnegie Mellon University’s Heinz College CDAO certification program. In her spare time, she is a co-host of a data and technology podcast called The Data Transformers. She is a 2022’s Top 100 Global Data & Analytics Innovators and a founding member of the Women Leaders in Data & AI.

Quote icon

The easiest things that people look at when it comes to data governance programs are what’s already available in your structured and relational databases. Until about 10 years ago when a proliferation of data and information is actually stored in unstructured documents. To be honest, you don’t really know where all this information is being stored. So, there are some companies that are really great at protecting and finding unstructured data, and that’s a challenge.

With the popularity of Data Mesh – I think it’s about time that organizations are thinking about data from a domain perspective. I think a lot of the reasons why the concept of Data Mesh and/or Data Fabric hasn’t come to popularity until recently is because we’ve been relying on technology systems to dictate how we manage data. I’m glad that we’re able to now move away from it. It’s how organization should be managing the data, thinking about things from a data point of view, putting more of the ownership on how data should be governed end-to-end thinking about data as a product from end-to-end

Transcript

Dave Mariani: Hi everyone. And welcome to, at scales data-driven podcast and today’s special guest is Peggy sigh and Peggy is the chief data officer for BigID. So, Peggy, welcome to the podcast.

Peggy Tsai: Thanks, Dave. I’m glad to be here today with you.

Dave Mariani: Awesome. Well, we have some great topics to talk about today when it comes to data management and data governance and data literacy and data culture. So we’ve got a lot of things to get to, but before we do that I would love to hear more about you and how you got to where you are today. Peggy. So tell us a little bit about yourself and, and BigID, the company that you work for

Peggy Tsai: Sure, Dave. So, I grew up in New York city, so I’ve been here all my life and working in financial services companies. I graduated with a degree in economics and decided to work in finance. so somehow I ended up finding myself, working for large companies like S and P global ratings, AIG Morgan Stanley, and it’s, it’s always been in a role of data. So I found myself starting out in data operations and then gradually working in a center of excellence group, back at and over, 10 years ago. And, that’s where I learned a lot about, what I know today in data management and data governance. You know, I’ve done work over data steward, I’ve done data quality and data operations, and really helped to manage and build out at scale data governance programs for different, financial companies.

Peggy Tsai: And, after so many years of working as a data practitioner, I really gravitated towards wanting to find and build better solutions for myself. I use a lot of the technologies out there. I learned a lot about the different architecture constructs and I realized that a lot of what we were doing was still very manual and not really taking advantage of the AI machine learning that I thought I saw as a potential. And this was close to three years ago now. So that’s when I decided to join BigID. BigID is a data discovery platform company that focuses on east cases around privacy protection, security and data governance. And what it does is it leverages AI machine learning to help discover a classified tag and help data practitioners like myself take action on the data faster and more efficiently.

Dave Mariani: It’s pretty interesting where we have a lot in common, Peggy and I studied economics in college and here we are both in data. What do you think about why is it an and I’ve heard from other people I’ve had on the podcast where they’ve had sort of similar path into data and analytics, what do you think it is about, economics or, or what draws people to data is like, it’s, it doesn’t seem like, in a lot of cases, people had a technical background before they got into data. and yet here we all are in these positions and, you know, and, and working for vendors or being a practitioner, working for a company using data to make decisions. What is it about us You think

Peggy Tsai: I dunno, it’s I, I agree. It’s, it’s a, it’s very fascinating. I think that people that are drawn to economics, you know, for myself, it wasn’t necessarily that the numbers are part of it. I was never good at economic tricks. It wasn’t my, the quantitative side, wasn’t it for me, but I really did love understanding the application of economics and how it affected people. So for me, I always love working and, leveraging, the skillsets of, bridging teams together. So that’s where I feel like data fits in really nicely. You’re working with a little bit of technology. I’ve, you know, that’s what I love to stay in touch with, but I love seeing how it really impacts and helps, solve business problems, all right. For the business side of the company. so that’s where I naturally gravitated towards. And I feel like a lot of the people that I talked to, certainly, the young, younger graduate graduates today, certainly they’ve taken degrees and they study specifically for, with an analytics certification or degree. But, back when I graduated from college and what I studied, there were no such opportunities, right. It was data.

Peggy Tsai: Exactly. So data was something that you, either landed towards or you found a job that gravitated towards that. So I feel, you know, very lucky and fortunate that I’ve been able to have these positions working in financial services companies, where I learned a lot of my skills and knowledge on this, on, on the job. And then obviously talking and networking with people, to do the job. That’s how I’ve honed my craft. So, I know as being a chief data officer at BigID, my favorite part of the job is actually talking to people about how they do their job, how they solve their problems, and then also building a product strategy that goes along with helping other chief data officers in the industry as well.

Dave Mariani: Yeah. You know, coming, coming at it from the side of the business or the end-user as opposed to sitting on the technical side, I think is a good combination. Right. Because, and then, like you, you know, learned on the job, and learned how important it was, how data was so important and, and, you know, in making a difference in whatever you were doing. So, so that’s, that’s fascinating. So, with BigID, BigIDeas is all about data governance, right? and, and so it’s a very hot topic right now, Peggy. and, and so, you know, in your, in your experience as a chief data officer, you know, how should customers even approach the whole concept and the idea of data governance

Peggy Tsai: So the best approach, and I’ve seen this having done data governance, in several types of financial companies, you know, data governance is, is best suited. dependent on the type of organization the maturity level is, is so important how, how it’s embraced by, different organizations. I think in the past, I’ve had the luxury of working in companies where data governance was a priority. Number one, right? So where we had a lot of resources, big budgets were able to purchase as many tools as we wanted to hire as many consultants and full-time employees to build out data governance, really sitting on executive boards where I was able to really pitch and receive funding. And I’ve also been in organizations where data governance was really difficult to champion. And I say that because it was very difficult to find stakeholders that would embrace data management or would want to come together to discuss and solve business, business issues that stemmed from data.

Peggy Tsai: so having seen both extremes personally, and now that I’ve had the luxury of speaking to two different customers and prospects, you know, first and foremost, it really helps to have some T some type of initiative business initiative or regulatory initiative driving the cause for true data governance and for improving data governance, right. It’s just campy because someone that was newly hired said, let’s just do data governance. Does it really work that way? You’re not really going to get the scalability from that, and you’re not going to get the traction, at least in the long-term level. So it might be something you get for a one year time period, and then it may not. So really, really understanding what the business benefits are. I think, leaders that, are leaders are in position to make the data governance change should really understand, what the business priorities are and having that, champion support and really building those business cases, and having that, directly aligned to things improvements that can be made in the data, whether it’s data quality or, improving the self-service models of the data and, you know, things that are really tangible to the business.

Peggy Tsai: Those are all key things, in building out the business use case for data governance. And then from there, it’s, you know, what I call data governance one oh one building out the framework, the people, the processes having, the technology behind it and supporting it and having making that a repeatable and agile process is really key.

Dave Mariani: So would you say then, Peggy, that it’s kind of starts at the top down in terms of the initiative, is it started the chief data officer that sort of gets the process going, and then, is it a top down initiative in other words, or is it a bottoms up initiative and what have you seen being the most successful model there

Peggy Tsai: I think it’s both, I think top-down, it doesn’t necessarily mean organizations don’t necessarily need to have a person with a title of chief data officer. I’ve seen other organizations where I’m the head of data architecture or head, or a CFO or chief operating officer or chief risk officer, someone at least on a senior level that can champion and build a business case with other executives, is, is really key here, right They’re the ones that is, is going to drive, the initiative and create a business case, get the funding that’s needed, but he or she needs to talk to, the people on the bottom, right The people that are doing the data work, the data science teams, analytics teams, your reporting teams, they’re the ones that are facing day-to-day operationally or on a, under their day-to-day job is dealing with the data, right

Peggy Tsai: And they’re the ones that are probably facing the struggles, the challenges, they’re the ones that are often fixing things that, you know, they’ve made PR probably should not be right. They should be doing things properly, you know, fixing data quality issues at the authoritative source instead of downstream, right? So having, you know, repetitive, they’re wasting their time and it’s really inefficient, and those are the people that are struggling with poor data quality. and, by taking a top, a top down approach and a bottom up approach, you really can get a holistic end to end understanding of, you know, the real data of maturity at the organization and, how to really go about solving it and how to prioritize the problems as well. It’s, that’s a really key

Dave Mariani: Definitely. That’s great. You know, what are some of the, I guess, what are some of the data sources, whether it’s some, types of data that you think that people should be spending more time thinking about when it comes to data governance and data quality

Peggy Tsai: Yeah, that’s a great question, Dave. And I would say that, for me, and for many of my colleagues that are practicing data governance today, the easiest things that people look at when it comes to governance programs are what’s already available in your structured and relational databases. If you talk to your counterparts in technology, they clearly have that type of information, clearly documented it’s well mapped out. And if you ask them, they would say, there’s no problems at all with discovering their data, finding their data. They know exactly where things are. And that’s a perfectly acceptable answer. I think up til, you know, five, 10 years ago when this proliferation of data and information is actually stored in unstructured documents, you know, people are, you know, how many emails do you get Dave every day hundreds I’m sure hundreds of people, too many emails, right

Peggy Tsai: Everyone has too many emails, you know, documents or people still in the insurance industry are still handwriting a lot of their notes on the field. And, you know, people have been asking for OCR digital images for the longest time now. So it’s a fact that there’s a lot of information in unstructured forms, data information that is not being governed and captured correctly, and it can be utilized in analytics and it can be utilized in existing enterprise reports. so, and also with, you know, with a lot of the privacy and protection laws, that’s mandating identification of personal information. a lot of that I’m also being, I’m also finding it in unstructured documents. So I’ve been doing some research in healthcare. you know, a lot of them, you know, if you go for a cat scan or x-ray scan a lot of those digital images contain your, your name, your date of birth may contain your some kind of identification number.

Peggy Tsai: And that’s personal information that insurance companies, healthcare companies are just storing in their hard drives. And, you know, in some states, depending on where you live, you know, that is personal information that you can ask to be redacted or protected or made secure. Right. So if you think about it all, there’s just a lot of potential for your personal information being stored out there. and that’s where companies are putting themselves at risk. To be honest, you don’t really know where all this information is being stored. So, there are some companies that are really great at protecting and finding unstructured data and, you know, and that’s a challenge for a lot of our customers. They’re, they’re saying that there’s just all these different types of data sources and that they know they’re not included in their governance program. And that’s a problem. And if you talk about it with their chief risk officer or privacy officer, they will all acknowledge that it’s a problem, but they don’t know the right solutions or they don’t know they’re not educated enough. So that’s kind of where I come into place. And, you know, talk about, protecting more of your data, in a way that fits into your existing master data management program, your reference data management program and how that’s so key,

Dave Mariani: You know, it’s like, it’s pretty daunting, right. Cause you know, I didn’t even think about the fact that you’re right on an x-ray. You could have your name on it. That’s PII. So, so what does Peggy, what, how does BigID help there is that as is what what’s, you know, it does it sort of tackle that, that problem.

Peggy Tsai: Yeah. So, it’s, it’s, it’s a new problem that we certainly have been hearing about and, every year BigID actually puts together internally a hackathon, where, we come up with new ideas, new ways to build and improve our products. And actually the reason why I’m talking about it is because this was my team’s hackathon idea was to build classifiers, which are our machine learning algorithms to tackle new types of data sources. So we’ve, specifically tackled, the digital imaging, medical, screens like, x-ray cat scans. We also looked at file images, sound files like a lot of zoom recordings as well, popular with those that can contain embedded passwords, credit card numbers. You know, I might be getting my credit card number right now to the audience today, and that might not be something that we may want to share.

Peggy Tsai: So being able to have an algorithm that can, trans translate the text and then be able to, systematically and automatically, identify, find and pick out things that we consider, personal information, like, you know, the name, email address, social security numbers, things like that. United States, social security number. We also built classifiers for other nationalities, identify our identifiers. So, things like that are really key to, for companies that start chipping away and being able to systematically automatically, protect, find and label this information and then take action on it, right. Whether they want to, protect it and move it to a secure vault for monitoring for hold retention, hold it. So w we, the first step is knowing that it’s there. And then the second, yeah,

Dave Mariani: I was just going to say discoveries. You first have to know what the extent of the problem is, and you can’t really know that unless you’re able to scan and, and, and, and take an inventory of what’s out there that may have embedded PII. That that’s a, that’s a, that’s a, that’s a big problem. So I’m glad, glad you guys are working on it. and, and helping solve that. You know, a lot of that seems like, you know, when it comes to, the kind of tooling, that seems to be something that companies really want to invest in, but, you know, I’m hearing, Peggy and wanted to get your opinion on this. A lot about this whole concept of data mesh and decentralized, analytics, where you have the business units, the data stewards who understand their business, being taking more of a direct role into creating data products, and when it comes to data governance and when it comes to, data security, obviously, you gotta have a pretty good infrastructure if you’re willing to sort of, you know, open the doors and allow people to start to self-serve and get access to that data.

Dave Mariani: So what’s your take on this sort of new sort of distributed way of, of building data products What do you think about this concept of data, data mesh hub, and spoke in the, like,

Peggy Tsai: I think it makes a lot of sense, and I think it’s, it’s about time that organizations are thinking about data from a domain perspective. I think a lot of the reasons why the concept of data mesh and data fabric hasn’t come to popularity until recently is because we’ve been relying on technology systems to dictate how we manage data. And I’m glad that we’re able to now move away from the constructs of an actual actual technology con, system, table, construct. So really it’s, it’s, it’s how organization should be managing the data, thinking about things from a data metadata point of view, putting more of the ownership of how data should be governed end to end thinking about data as a product from end to end, should be put into the hands of the people who use the data versus the technology system owners, who’s who simply own the system itself.

Peggy Tsai: So it’s a change that I think will gradually become even more popular given the fact that there needs to be more of these tools out there that can support the concept of data mesh and data fabric. I think that organizations also need to change their mindset and not think of data simply being on the table. It’s something that should be open-ended, redesigning the way data is, logically put together as a concept is, is, is huge, right Because for the longest time, I would tell you, people did not think about data management in that way. So in one respect, it’s educating and changing the way that people think about data management. On second hand, it’s also encouraging and up-skilling technology teams now on how they should be managing governing data. So it’s going to be, probably a few more years, I would predict until this concept is, is going to be mainstream, but boy, is it going to be very exciting, a game changer in terms of people are going to be using and leveraging data across the organization

Dave Mariani: Yeah, it was like the guy I was going to use the same word for, a game changer. It really is mean we went from, you know, centralized data management, to fully decentralized in the Tableau sort of visualization revolution where it was like, king of anything goes. and I think that what I like about the data mesh concept is it sort of brings the best of both worlds together, right You still need to have some tooling and standards everybody’s playing by the same rules and speaking the same language, but, you know, having, you know, the data stewards, the people will understand the business, be able to, author their own data products makes a lot of sense versus a, I always say it’s easier, you know, it’s easier to, you know, teach the business, how to deal with data than teach the data people you know about the business. so, it seems like it’s putting, you know, putting, putting the right emphasis in the right areas. So it’s definitely a people oriented thing, but, I hear a lot about their sort of organizational people and it’s like, it still has an underpinning in technology, right So you still have to have the right technology foundation to enable that kind of, that kind of a dud of an organizational model.

Peggy Tsai: But I think, organizations will see the benefits of this type of setup much more quickly. one, because it will impact more than just their data teams. their business teams certainly benefit from this type of new architecture and framework. they, they’re going to see, you know, to the bottom line, they’re going to see how they’re going to make better data-driven decisions. consistency in their data across the board is going to be better. You know, just the concept of just governing data on the domain level on grouping level is so key because usually that’s so distributed and that’s where you have all these differences and discrepancies and reporting of their financial data, or even their basic master mat master data. so it improves accountability. It’s going to improve the quality of the data. but you know, one of the challenges I see is how organizations are going to exactly implement data mesh.

Peggy Tsai: And again, kind of speaks to one of my, one of your earlier questions and what you had said that, how that prescriptive, how is also going to depend on the organization itself, the expertise of the different business units, how much it’s going to let each business unit own and, set up the, each of these, data meshes and, whether or not the data stewards. Sometimes you’re gonna have very mature data stewards and one in one part of the organization and on others. So those resources it’s going to be key, but that’s, you know, obviously the devil is always in the details, but, I think that organizations that embrace this type of mindset are going to be more successful in the long run.

Dave Mariani: So Peggy, how do you think that data literacy factors into all this, you know, that, is that a top down, is that a bottoms up Is that data mesh decentralized, you know, how do you think, how do you think that, how do you think it’s changed things change or don’t change when it comes to data literacy and getting people, to be able to, you know, speak and read and write data

Peggy Tsai: Yeah. So I’ve always seen, data literacy, sort of this adjacency pillar that sits right next to data management, data governance. So whether or not organizations are, you know, thinking about data mesh or not, hopefully there is someone in the organization, you know, like the chief data officer, that is thinking about how to improve data literacy across the organization. So it’s us, as you said, you know, training, the business teams, everyone in the organization, you know, should, should know how to use basic tools, reporting tools like Tableau or click. They should know how to access and create reports and understand what, what, what it means and how to interpret the data. So that’s basic training. I, I feel like should be given just like day one, when you’re onboarding new employees, you give them training on compliance and how to use the new, the new finance tools and new payroll tools.

Peggy Tsai: It’s like, let’s train you on, basic literacy. So I think it’s really important that all organizations start to embrace, and be really measured, measure themselves against, how many people are properly using the tools, trained on them. You know, so I think that, it’s still a fairly new concept though data literacy and how that correlates with the overall data culture of the organization. And, but end of the day, it does stem from having a single chief data officer that can, champion and, communicate and really push through these, concepts, the real,

Dave Mariani: Yeah, well, that’s, that’s, that’s your role right now And BigIDeas is to be that, to be that sort of, that, that thought leader and to invest in those right areas to build a data-driven culture. So, Peggy, I want to thank you so much for joining the podcast today. I think everybody learned a lot, and we know how important governance is. And, I think you opened our minds to new ways of thinking about data governance. it’s everywhere. It’s not just unstructured, it’s an image and everywhere else. So, so, so, so stay the fight, keep that, keep fighting the good fight, help us, help us better manage our data and, and keep it safe. So, Peggy, thank you for joining the data-driven podcast and for everybody out there staying data-driven. Thanks.

Peggy Tsai: Thanks, Dave.

Be Data-Driven At Scale