March 23, 2020
COVID-19 and Remote Work – The Catalyst to Transform IT & Fully Embrace the CloudWelcome back to our “Analytics Leader Spotlight” series where we get to know more about what business leaders are doing with data analytics at their organizations and how they got into the role they are in today. Our latest spotlight is with Matthew Hartwig, Data Infrastructure Product Lead at Wayfair.
Matthew, thank you for taking the time to speak with us today about your career in data analytics. Would you tell us more about your role at Wayfair?
A: I work for Wayfair as the Product Lead within our data infrastructure organization where we focus on three things. The first is the technologies that data consumers use to access, work with, and visualize data which is a combination of AtScale, Looker, Data Studio, and Notebooks. The second is the technology that our data producer community needs to be able to do their job. An example output of their work would be the underlying dimensional model that an AtScale cube reads data from. It could also be the training set for a machine learning model or an aggregate table used for a reporting dashboard. The third area of focus is the actual foundational infrastructure that sits behind and powers the activity of those two user groups. This includes the compute engines and storage systems for our data. Historically, Wayfair has been really oriented around on-premises infrastructure. Over the past year and a half, we’ve been shifting from an on-premise infrastructure to a hybrid cloud infrastructure to more of a cloud first infrastructure in the data space. It was actually there where we used AtScale to drive adoption of our cloud technologies for our end users.
What is the business impact of the work you and your team are doing with your technical infrastructure? What are you able to do with the output and the visualizations of the data?
A: One of the things that Wayfair talks a lot about at a high level is developer velocity. Because Wayfair employs thousands of software developers and hundreds of data developers, we are always thinking of ways to make our developers more productive. To that end, we’ve given specific attention to our Wayfair data developer community that is constituted by the folks focused on how we can deliver fresh data used to answer questions from the business. We’re working to make these developers 10X more productive, leveraging a combination of new technologies that we’re building in house and introducing from partners. Simultaneously, we’re introducing new tools and technologies for our end data consumers so that they can level up their capabilities in analyzing data and deriving business insights. An example of that work is an in-house anomaly detection product, where we have helped transform a non-technical user base of a couple hundred people into forecasting experts. Not only are we building an Auto-ML framework under the hood, but we’re actually using hundreds of variations of models and using hyperparameter optimizations to identify what is the best possible model to actually put into production. We’re scaling using computers rather than people.
Let’s talk about how you got started in the data and analytics field. Where did you initially start out and how did that set you up for the role that you’re in today?
A: The role that set me on the trajectory to where I’m at today was a product management role at a company headquartered in the D.C area called Taxi Magic. I was the product manager for the iOS and Android apps. At that time, it had a pretty large user community of about 200,000 users. Unfortunately, we were playing in the same space as Uber and Lyft. During my first two months of working at the company, Uber announced that they received a $500 million investment. That was sort of the beginning of the end of the company, but it actually opened up some awesome opportunities for me early on in my career. Because Taxi Magic wasn’t doing so well, the people above me kept leaving. Ultimately, I was given the opportunity to form a small business intelligence and data science team, where I built my first BI platform using a combination of Amazon Redshift, SQL, and Tableau. I used that experience to reflect on what I really wanted to do with my career and recognized that it was too early in my career to focus on product work. I shifted my focus to look for roles that would allow me to come in as an individual contributor data engineer which led me to find a role at Wayfair where I was able to come in and actually write production code for eight hours a day, often more. I did that for a little while before I was given the opportunity to take over a new team in the supply chain space. From there, I started the first analytics supply chain team in engineering at Wayfair and I grew that team from a team of one (me) to a team of about 25 people. Near the end of my time leading that group, the biggest challenges that we were facing were within our infrastructure. There was a lot of fragmentation in the space, technologies weren’t scaling and we had very low data developer velocity as a result. I saw the opportunity to say “Hey, if we were to introduce a product function into our infrastructure group and introduce a complementary software engineering function, we could actually start building out a platform that would act as an integrated development environment for data engineers or data producers. I leveraged that vision to move over into my current role. I was really fortunate because I got to partner with some really intelligent engineers with ambitious visions of their own. Over the last two years, we have built out and dramatically expanded an integrated development environment using our combined visions.
What are two of the largest challenges that you face today in your role at Wayfair?
A: I think the main challenge that we’re facing is identifying how we effectively transition through our present state of a hybrid on-premise/cloud infrastructure to a cloud first infrastructure that allows people to easily interface with an elastically scalable system. The challenge is that we have a lot of people who have developed deep expertise in certain pieces of infrastructure that we aren’t featuring anymore. As we move from on-premises platforms to Google Big Query, Dataproc and DataFlow, that skill set is still important but it’s no longer the principle skill set of the role. One of our greatest challenges is figuring out how we take that existing skill set and translate it to make sure it is still valuable without significant costs to retrain.
The transitory state that we’re in right now is challenging because we’re both supporting run the business operations with our on-premises infrastructure while simultaneously building a better future and trying to get a huge organization to shift in a short period of time. Fortunately for us, by building our integrated development environment, we built an abstraction layer that they can use and feel familiar with.
The second biggest challenge that we’re facing is around the actual tools and technologies that we’re delivering to the folks that are consuming data. Historically, Wayfair was a really big Tableau and Excel shop, and so effectively if you were a data consumer at Wayfair, you probably knew SQL, an SSAS cube, and you were an Excel or Tableau ninja. We’ve been focused on figuring out what tools and technologies we need to introduce to broaden the capabilities of our business user community. If you have to hire a data scientist for every complex problem, you’re going to run out of business pretty quickly. We want to be able to introduce new tools and technologies that will allow people to do more complex and challenging analyses and operations than they could historically. When we think of the analytics space, there’s a spectrum that starts from descriptive analytics, which is what your traditional KPI business dashboard will tell you.
Wayfair has always been really good at descriptive analytics. That spectrum extends all the way over to prescriptive analytics which tells you not just what’s going to happen, but what you should do about it. What we’re trying to figure out is how we can shift into being able to do more prescriptive and predictive analytics that’s delivered through consumable interfaces for end data consumers who maybe don’t have an engineering skill set. Maybe they know SQL or maybe they know a little Python, but they’re not going to be writing production code. Then there’s also an interesting niche around diagnostic analytics. We think that products like AtScale and cubes are reasonably good at things like diagnostic analytics because you can look at a pivot table and say “Hey, why is that number going up and going down?” and you can slice and dice through cross tabular analysis to isolate what is actually driving that descriptive measure. We’re trying to figure out what are the right tools and technologies that we can bring on without overloading people. If we bring in a hundred new tools of technologies, then suddenly none of those are tools that people have mastery over to get the full value out of using them.
In the time that you’ve spent at Wayfair, what’s a project or initiative that you’re proud of?
A: One of the projects that I’m the most proud to have been involved in (and I can’t take full credit for it because we’ve had a pretty amazing engineering leader and product leader who spearheaded the project), was about a year and half ago when we recognized that we were struggling with the broad fragmentation of analytics systems and all of the data that lived in those analytics systems at Wayfair. We have data that lives in SQL server, we have data that lives in Vertica, we have data that lives in Hive, etc. With multiple storage locations, it makes it really hard to find the data that you’re looking for. We recognized that there was an opportunity to focus on data cataloging and data search, which is far more than going into a SQL editor and trying to navigate the branching tree of databases and schemas and columns down to fields. That search doesn’t work when you have 20 different systems and hundreds of thousands of different data entities living on those systems.
How long did that take you guys?
A: We started about a year and a half ago, but we delivered the first usable product within four to five months. We’re definitely a “build, measure, and learn very quickly,” type of group. The first product that we delivered looks very different from what we have today. But ultimately, we saw that what we delivered was actually driving some value. We then used the user community that was built up through that initial release to maintain momentum in our product process to really help us reach a point where if I were to revisit the buy Alation versus build what we have today (even if they told us that they would give us Alation for free), I would still take what we built.
How big was the team that built it?
A: We had one product person, one tech lead, and six engineers. It was one strong team.
What’s your advice to others in the industry who are getting started with their careers and aspire to work their way up to your type of role?
A: I get asked this question pretty often in interviews as well. They’ll ask me “What is something that you look for in a new hire?” I think that the most important things for a person to have in this field are passion and curiosity. The data space is in many ways very similar to the software engineering space, but it’s changing at such a rapid rate. If you’re not constantly curious and constantly passionate, you’re going to get left behind.
You need to be constantly learning. The people who settle on one or two technologies tend to get left behind in the data space because it changes every five years.
Looking back at where I started with Wayfair, I spent my first year and half developing a really deep skill set in SSAS development. Now, SSAS is something that Wayfair isn’t even using anymore. It’s a skill set that is gone, and if I had rested on mastery of MDX, I would absolutely not be where I’m at today. Instead, I was incredibly passionate about the space. Even if something is not necessarily relevant today, I’m still going to read about it and learn about it.
These days, education is becoming more accessible with the emergence of online learning technologies. You can truly get hands-on experience. With Google Cloud Platform, we’ve started our shift over and I’ve been playing around with and building my own sandbox on the side, and that’s so helpful for me because it means that I’m getting hands-on experience that I can later translate into the conversations that I’m having with users as they’re trying to use Google Cloud Platform to do big data development. I’m able to say in response to them offering complaints like, “Yes I hear you, I understand, and I can speak from experience with you.”
NEW BOOK