Many of our clients are in early days with Data Science. They may be using data science and analytics to a degree, but not fully. Or they may be starting. We know of very few organizations that would say they are in a stable “end-state” and “finished” with the design of their data science team.
For most organizations, building a team is a multi-year, multi-phase activity. One medium-sized company we know that has multiple source systems just used a large data warehouse project to organize harmonized and curated data, provide excellent self-serve BI and report and data information distribution to their large distributed team. They started from a difficult place, and the total project (in several 3-4 month phases) took well over a year to complete. Importantly, they used the project to build out their team, which is how many people start. On completion, the product was great, it was really helpful, and the project was indeed the springboard for forming the core of the team. But a year later, the pressure was on to develop insights and models, and apply them in practical ways. (This is actually a pretty common story, by the way.)
Tucked away in this short little "case study" is the hard lesson – there are at least three distinct jobs/roles in a data science team and the corresponding skills:
- The first skills the team needed were deep data, systems and database skills, to implement the chosen products, and move reliable, curated data into the system. This team was highly technical. It was a mixed team of in-house and product vendor staff. This is a big deal at the beginning, but it never goes to zero.
- The second skillset was “standard BI” – dashboards for drilldown and interactive exploration, and to replace previous reports and so on. This required a different skills – presentation and visualization skills and report-writing coupled with business knowledge. It’s “sort of” like programming, but it’s specialized use of their chosen package to deliver the vital information to the broader user base.
- The third skill set is model development, and advanced data science skills. While there are packages that help you model, much of the work is still done in tools that let you program, and many of the questions are somewhat unique. Neither of the first two groups had the statistics, programming, or model development expertise to develop predictive models, and sometimes to implement and evaluate them.
The catch is you not only need all three of these distinct skill sets, you may actually need several of each type to build and then maintain your systems. There are many skills to master here. Your team will want vacations. You will want some redundancy. Dabbling is one thing, having a dependable organizational capability is another. One person won’t be enough in any particular area because there are too many skills at play here.
Also as you take on more projects (and build out your team), you’ll discover you are doing things over and over again. Need a list of zip or postal codes for this project? Or census demographic stats for another project. Your team member may just go get them. But if you need that data for four projects, do you get it once and keep it, or get it four times. Do you manipulate it four times? Do you have shared learning? Does your team even know you are doing it four times? Each team member is simply doing the work – trying to get their job done. A manager is the only person who’ll likely even see the re-work, and be able to come up with a good approach.
While we’re on skills, you’ll actually need an additional critical skill that is very overlooked and hard to find – analytics leadership. The leader has to be able to see the “story” and sell the “benefit” because analytics projects that actually improve the business can sometimes have real challenges (if it was easy, you'd already have done it). Challenges like your data may not be fully trustworthy – fixing that can be a project too. Or you may ask more detailed questions than you were prepared for in your “version 1.” A leader who can see through these issues is worth their weight in gold.
But back to the larger team that actually does the data science. Almost any medium sized or larger organization will need several of the three key types of skills:
- Data Engineers
- BI and Visualization Specialists
- Data Scientists
In an insurer (one industry where Data Science is widely used), some of the second and third groups may be drawn from your actuary pool. But consider that if they are doing “data science” work, then they won’t be doing “actuarial work” at the same time. But there are some skill alignments and some interest alignments that can make this part of a workable strategy.
So while skills are the raw material for success, it’s really a department that you need to build. Maybe several depending on your size. But for most organizations, you just can’t go out and hire the 3-7 or 5-10 people to do all this “data science stuff” in one shot. First, you likely won’t know the skills you need, but you also won’t keep them busy in the first few months (or years). You likely will want to phase in the recruitment of skills. Secondly, if they aren’t challenged, they’ll be bored, and if they’re good then they will be highly sought-after. This brings other issues, like employee development, retention and turnover. And as your data science team grows and becomes embedded in the business, will you neeed an Analytics Centre of Excellence (COE)? Finally, the state-of-the-art in data science is still rapidly evolving, so keeping your new team current is going to be an ongoing challenge.
So how do organizations do it?
Many organizations choose to develop a relationship with a key supplier, who can provide the key skills you need at the time, and who will help you ramp up your internal skills.
You’ll recognize this partner, because they’ll help you build build a roadmap. They'll help you take control of the assets you need (which they may build and transfer or even manage for a time) – like ETL (extract, transform and load) and Data Curation processes. Like Data Warehouses and data pipelines. Like Data Governance, if that's an issue. Like BI and visualizations that you can take over and tweak. Data lakes if you have big data applications. Buying it as a “black box” where you are a user and someone else owns the headaches is also a strategy and it definitely works in some clients and situations. But in the medium term, as the skills stabilize, clients will sometimes decide to bring these capabilities in-house.
If developing or growing your in-house data science capability is your goal, ask us how we can help. We can help you find the best structure, the best skills on-ramp roadmap, the assets you need to build and/or repatriate, and choose (or rationalize) the technology stack, tools and skills that will be at your foundation.