What is a skills-based economy, and how is LinkedIn moving from vision to implementation? As LinkedIn Director of Engineering Sofus Macskássy shares, there’s AI, taxonomy, and ontology involved in building the Skills Graph that powers the transition.
Skills are the new currency. That’s a bold statement coming from LinkedIn CEO Ryan Roslansky. Roslansky makes the case for the so-called skills-first economy based on both anecdotal evidence and data. Skills-first hiring was mentioned in the 2022 State of the Union address, and a growing number of CEOs are calling on the need for companies to shift how they hire.
In addition, recent LinkedIn data shows that the skill sets for jobs have changed by around 25% since 2015. By 2027, this number is expected to double. That means jobs are changing on you even if you aren’t changing jobs, just as business demands are changing on you even if you’re not changing your business.
That was not the first or the last time Roslansky made that point. In 2021, LinkedIn’s CEO outlined a vision to help transition the hiring market from focusing solely on titles and companies, degrees, and schools to also focusing on skills and abilities. In his 2021 post, Roslansky announced new LinkedIn features and services. He also referred to AI-driven changes, a prescient point brought forward again in 2023.
Roslansky is not alone in identifying this shift. In 2018, we posited that in a rapidly shifting job market, being able to formalize skills is a requirement for job seekers and employers going forward. In 2019, we followed that up with more analysis on the relationship between the future of work, skills, and knowledge graphs.
In 2021, Roslansky first referred to the LinkedIn Skills Graph. The Skills Graph was introduced to help create a common skills language, and it’s powering a multitude of LinkedIn features and services, as well as Microsoft Viva. In 2022, LinkedIn Director of Engineering Sofus Macskássy elaborated on how LinkedIn’s Skills Graph is being built to power a skills-first world.
Further details on building and maintaining the skills taxonomy that powers LinkedIn’s Skills Graph were shared earlier in 2023. Today, Macskássy and his team are sharing more details on how they are extracting skills from content to fuel the LinkedIn Skills Graph. We caught up with Macskássy to discuss the journey from vision to implementation.
As Macskássy pointed out, the story of skills on LinkedIn goes way back. LinkedIn users have long been able to add skills to their profiles and endorse each other. LinkedIn understood that skills represent an important vocabulary by which people communicate and understand each other. But at some point the realization settled in that this goes beyond just finding a job or finding talent.
When the CEO highlighted LinkedIn should move into a skills-first world, there was a question – what does that even mean? From a technical perspective, the team needed to understand what the vision was.
As the team started digging into the various product lines as well as the news feed, advertising, recommendation, and search, they realized that skills could be leveraged as signals to aid ranking and recommendation. But going from vision to implementation has not been without obstacles.
Many people didn’t have skills listed on their profiles. Even when they did, people did not necessarily use the same vocabulary. It was not easy to figure out whether the two skills were related to each other or even synonyms of each other. This is what Roslansky’s mention of helping everyone speak the same “skills language” alluded to.
To resolve ambiguity, the first approach was to build out the LinkedIn skills taxonomy. The skills taxonomy is where LinkedIn organizes and categorizes skills based on their hierarchical relationships with each other.
Each skill is represented by a “node” in the skills taxonomy, and nodes are linked together to form a hierarchical skill network through “edges” called knowledge lineages. Knowledge lineages reflect how two skills relate to each other. Skills may relate for various reasons, such as the skills are both part of a career specialization or one skill is for a tool that is used to apply another skill.
To create a stronger network of connected skills, a framework called “Structured Skills” is utilized. This framework increases understanding of every skill by mapping the relationships it has to other skills around it.
The connected skills taxonomy is curated by a combination of human taxonomists and machine learning. This Human-In-The-Loop approach to building the Skills Graph helps grow the taxonomy at scale while ensuring the skills data meets the required quality and standards.
“It is a feedback loop where we see new skills and new ways of mentioning skills across the board. We use this to expand the Skills Graph dynamically as we see new ways of phasing a skill or even a new skill we never had seen before.
For example, prompt engineering is now a new skill that really popped up over the last couple of quarters. So we want to dynamically add it to the Skills Graph. We then use these new skills that can subsequently be tagged in the content”, Macskássy said.
That is a continuous process, and the cadence by which it’s done depends on the type of content. The skill harvesting process happens on demand as content is updated. Potential additions or improvements to the Skills Graph are batched, so reviews and updates don’t have to be executed every time any single update happens.
In 2022, LinkedIn’s taxonomy consists of over 39,000 skills spanning 26 languages, over 374,000 aliases (different ways to refer to the same skill – e.g., “data analysis” and “data analytics”), and more than 200,000 links between skills. In 2023, there are more than 41,000 skills.
Instead of relying only on taxonomists to manually curate over 41k skills, LinkedIn applies machine learning techniques to help scale the taxonomy construction. This includes a tool LinkedIn developed, KGBert which was inspired by KG-Bert, a supervised model that applies a deep semantic understanding of skills to predict relationship lineages.
By utilizing machine learning models, LinkedIn can extract and map skills from diverse content sources and collect feedback for continuous model improvement and member value. To do this, large pieces of text (such as job descriptions and resumes) first need to be segmented out into meaningful parts.
Mentions of skills can then be removed from each piece of the text. Once extracted, they are normalized into canonical/single representations (i.e., “data analytics” and “data analysis” are the same type of skill), represented in the skills taxonomy.
Attention also has to be paid to where a skill sits in a piece of content and what type of content it is. Skills are often represented differently in resumes, member profiles, or job descriptions, so models are fine-tuned to learn the specifics of those types of content.
To address these nuances, LinkedIn built an architecture and platform that caters to the challenges of extracting skills and mapping them onto the Skills Graph. The post that Macskássy’s team published details the AI model workflow that extracts and maps skills from raw text, such as job postings, as well as the architecture for serving models at scale.
As former AI Division Technical Lead for Taxonomies and Ontologies at LinkedIn Mike Dillinger points out, however, taxonomies are the duct tape of connected data. They seem simple, flexible, and familiar. They are widely used. And they seem to work across many use cases and many domains.
But when looked at in more detail, taxonomies turn out to be crude tools for knowledge organization that are difficult to create, to scale, to adapt, to align, and to build on. A key reason is the fact that they are limited to modeling hierarchical relationships, which are only a small part of the rich relations connecting entities in the real world.
Some of the technical details shared on the Skills Graph allude to harvesting and exploiting more than hierarchical relationships. Macskássy verified that LinkedIn utilizes an ontology used not only for mapping skills between branches but also for other concepts. That has provided great improvements when it comes to ranking and recommendation, with more details to be unveiled in due course.
Applications across LinkedIn include career-relevant skills and job-important skills. The approach has resulted in performance improvements in Job Recommendation, Job Search, and Job Member Skills matching.
As part of the process, recruiter skill feedback and seeker skill feedback are collected. When a recruiter manually posts a job on LinkedIn, a list of skills, pulled by LinkedIn’s AI model, is suggested after they fill in the posting content. A recruiter can edit this list depending on if they believe a skill is important.
Similarly, when a job seeker opens a job posting on LinkedIn, a feature will show how many skills overlap between their profile and the job. Seekers can review the top 10 skills used for skill matching calculation, and if a certain skill is irrelevant to the job, they can provide feedback.
Obviously, harvesting and utilizing those “implicit” skills, in addition to skills explicitly listed by members is something LinkedIn has put a lot of effort into. And there are anchors to surface those implicit skills and feedback from job seekers and recruiters on both ends of the recruiting process.
But what about LinkedIn members who are not actively recruiting or being recruited? Wouldn’t they benefit from the opportunity to see the skills LinkedIn harvested from their content and perhaps be prompted to add them to their profiles too? Furthermore, explicit skills are part of a member’s profile and, therefore, get to be exported if members so choose. What about implicit skills?
Macskássy noted that there are surfaces and flows by which members can add skills. LinkedIn continues to improve those flows where suggestions are made based on member resumes and profiles. Members are asked whether they might want to add certain skills and even associate certain skills with certain of their jobs as well. However, prompting is not done in an overly aggressive way.
As for implicit skills, Macskássy said those are not necessarily surfaced because they are more fluid in nature. It makes less sense to surface all of these, particularly as LinkedIn is moving forward into how to think about skills more dynamically and more in-depth, he added. So, those skills are not considered as part of member profiles and are not exported either.
There are more dimensions of implicit skill harvesting that are worth highlighting: provenance, credibility, depth, and interoperability. Since implicit skills are extracted from content that members themselves provide, is there a way to evaluate the weight this content carries or does it have to be taken at face value?
The content that members provide is how they want to represent themselves, Macskássy said. So that has to be taken somewhat at face value, as it all comes down to trust. LinkedIn has not observed that members are stretching the truth because there are a lot of people looking at their professional profiles, and they would be called out, Macskássy added.
However, there is a mechanism through which the depth of member expertise can be assessed. LinkedIn Skill Assessments (SAs) are adaptive assessments designed by LinkedIn Learning experts to evaluate and validate skills across a range of domains.
These short-form assessments are accessible through the profile skills section, where members can click on the “Take skill quiz” button to access a list of SA recommendations. Upon successfully passing a SA with a 70th percentile or higher score, members are awarded a “verified skill” badge that they can display on their profile page and is visible to recruiters.
Skill assessments, as well as other learning material, align with skills in the skills taxonomy. It is represented internally in member data and can be part of the ranking. The amount of SAs and certificates is still very lightweight compared to other content. LinkedIn is considering how to best elicit and leverage it, for example, by evaluating member interactions.
Something that could potentially help both in terms of interoperability, as well as content enrichment, is Open Badges. Open Badges is a data specification initiated by Mozilla and the MacArthur Foundation. The goal is to provide a simple but flexible format for documenting and showcasing skills. A badge corresponds to a skill a person has, as recognized by a third party.
That means that by using Open Badges, LinkedIn members would be able to both import skills documented by third parties as well as export their skills in a standardized format. Macskássy said that the team is aware of Open Badges, however, the focus is primarily on what can help with the member experience on the platform. Open Badges is on the radar but has a low priority at this point.
LinkedIn is investing in continuously improving skill understanding capabilities in approaches such as leveraging Large Language Models. Examples include using LLMs to provide rich descriptions of skills, fine-tuning LLMs to improve skill extraction, or leveraging embedding for skill representation.
As the team notes, the LinkedIn Skills Graph is at the center of powering the skills-first transformation. The tech stack for mapping content to the Skills Graph enables constant updates and evolution of the Skills Graph to stay up to date on the always-changing skills landscape.
Stories about how Technology, Data, AI, and Media flow into each other shaping our lives.Analysis, Essays, Interviews, and News. Mid-to-long form, 1-3 times per month.
Also published here.