Welcome to the Proof of Usefulness Hackathon spotlight, curated by HackerNoon’s editors to showcase noteworthy tech solutions to real-world problems. Whether you’re a solopreneur, part of an early-stage startup, or a developer building something that truly matters, the Proof of Usefulness Hackathon is your chance to test your product’s utility, get featured on HackerNoon, and compete for $150k+ in prizes. Submit your project to get started!
In this interview, we sit down with Nadine van der Haar from Accent Labs to discuss how their linguistic data platform is bridging the resource gap for African voice technology. By providing gold-standard datasets for model fine-tuning, Accent Labs is setting the stage to make AI inclusive for over a billion speakers on the continent.
What does Accent Labs do? And why is now the time for it to exist?
AccentLabs is a linguistic data platform bridging the 90% resource gap for African voice technology through gold-standard datasets for model fine-tuning. Starting with a foundation of 8,172 verified segments in Nigeria, we have built a pipeline to map complex regional phonetic nuances that global models currently ignore. Our goal is to provide the critical data infrastructure required to make AI truly inclusive for the continent’s one billion speakers. Now’s a good time for Accent Labs to exist because the rapid global adoption of AI makes it absolutely urgent to ensure these models are inclusive and functionally capable of serving diverse, historically overlooked linguistic demographics.
What is your traction to date? How many people does Accent Labs reach?
As a pre-launch infrastructure project, our current direct user reach is 0 per month; however, our projected impact is measured by the 1.4 billion speakers who will benefit from the downstream integration of our data into global AI products.
Who does your Accent Labs serve? What’s exciting about your users and customers?
Our pipeline is designed to create sustainable work opportunities for local voice trainers and linguistic experts across regions including Nigeria and Zimbabwe and turning regional linguistic knowledge into high-value digital assets.
By eventually providing data for over 2,100 languages (including those of island nations like Madagascar), we enable global tech enterprises to reach previously invisible audiences, providing millions of residents with their first functional access to voice-first healthcare, banking, and education tools.
What technologies were used in the making of Accent Labs? And why did you choose ones most essential to your tech stack?
To effectively map complex linguistic structures, Accent Labs relies on Neo4j as a graph database to meticulously track relationships between speakers, dialects, and segments. For seamless and fast data retrieval in their demo, they integrated a direct-to-Algolia pipeline, ensuring their intricate metadata is both highly organized and instantly searchable.
What is the traction to date for Accent Labs? Around the web, who’s been noticing?
While currently in the pre-launch phase, Accent Labs has successfully validated its technical architecture by securing 8,172 gold-standard audio segments covering major Nigerian accent profiles. They have also deployed a robust Human-in-the-Loop curation pipeline leveraging Neo4j and Algolia, and are already planning their Phase 2 expansion to benchmark 16 official languages in Zimbabwe.
Accent Labs scored a 53.73 proof of usefulness score(https://proofofusefulness.com/report/accent-labs) - how do you feel about that? Needs reassessment, or is it just right?
The score feels incredibly accurate for where we are. It validates that the linguistic gap is a high-utility problem and that our Neo4j/HITL architecture is the right engine to solve it. While it reflects our pre-revenue status, I view 53.73 as a 'Ready-to-Scale' signal. We’ve built the specialized foundation; now we just need the stage to perform on.
What excites you about Accent Labs' potential usefulness?
I am most excited by the opportunity to transform regional linguistic knowledge into a high-value digital asset. This isn't just about improving AI, it’s about creating a sustainable economy for local voice trainers. By applying years of experience in model optimization to this market gap, I am building the infrastructure to make 1.4 billion speakers visible in the AI era.
Walk us through your most concrete evidence of usefulness.
The most concrete evidence is the failure rate of current SOTA models on our first 8,172 Nigerian segments. When major voice models attempt to transcribe these accents, the Word Error Rate (WER) spikes significantly compared to standard Western accents. Our data doesn't just improve these models; it is the difference between a voice-first healthcare app being functional or dangerous for a speaker in Lagos or Harare.
How do you measure genuine user adoption versus "tourists" who sign up but never return?
For a data infrastructure project, our primary users are the AI researchers and developers fine-tuning models. We measure 'stickiness' by Model Convergence Speed and Error Rate Reduction. If a developer uses an AccentLabs dataset and sees their model reach benchmark-ready accuracy significantly faster than with generic OTS data, they aren’t a 'tourist', they’ve found a mission-critical component of their training pipeline.
If we re-score your project in 12 months, which criterion will show the biggest improvement, and what are you doing right now to make that happen?
Right now, we have a validated technical foundation. Over the next 12 months, our biggest improvement will be in traction and scalability. We are currently prioritizing a Phase 2 expansion to benchmark Zimbabwe’s 16 official languages as a blueprint for our continental rollout. Simultaneously, we are transitioning from a static research repository to an integration-ready data pipeline. This ensures our gold-standard datasets are formatted as plug-and-play training sets, specifically optimized for seamless ingestion by enterprise voice agents and foundational models.
How Did You Hear About HackerNoon?
I actually first came across HackerNoon after being invited by a community member who saw a previous project shared on dev.to. That invitation led us to discover the Proof of Usefulness initiative, which felt like the natural next step for AccentLabs to document our journey from a technical build to a market-ready asset.
Since you've established a solid foundation with 8,172 verified segments in Nigeria, what are the primary hurdles you face in transitioning from this pre-launch 'Proof of Concept' to generating early enterprise partnerships?
The primary hurdle is 'Standardization.' Beyond just raw audio, enterprises require a clean-room provenance where metadata and phonetic alignment fit perfectly into existing training loops. We are currently leveraging our pipeline to ensure our Neo4j-rich metadata is exported in the exact formats Big Tech procurement teams require for frictionless ingestion.
Your roadmap points to expanding into Zimbabwe to benchmark its 16 official languages. How do you plan to scale your Human-in-the-Loop curation pipeline effectively across so many new languages while maintaining data fidelity?
Led by a founding team with deep roots in AI training and data architecture, we operate a specialized, expert-led federated model. We’ve replaced traditional, high-noise crowdsourcing with a high-fidelity pipeline that empowers native linguistic leads. By integrating Quality Control (QC) protocols, our architecture is designed to resolve complex phonetic nuances at the source, ensuring every dataset meets a verified Gold Standard for the most demanding model evaluations.
You've highlighted the massive potential impact for 1.4 billion speakers to access voice-first healthcare and banking. What specific global AI models or tech platforms are you prioritizing first to ensure your data genuinely translates to downstream usefulness?
We are optimizing our datasets for compatibility with industry-standard architectures like Whisper and SLAM-1. Because these models act as the operating systems for modern voice applications, ensuring our gold-standard African data can be seamlessly integrated into these frameworks is the most direct path to impacting 1.4 billion speakers.
Meet our sponsors
Bright Data: Bright Data is the leading web data infrastructure company, empowering over 20,000 organizations with ethical, scalable access to real-time public web information. From startups to industry leaders, we deliver the datasets that fuel AI innovation and real-world impact. Ready to unlock the web? Learn more at brightdata.com.
Neo4j: GraphRAG combines retrieval-augmented generation with graph-native context, allowing LLMs to reason over structured relationships instead of just documents. With Neo4j, you can build GraphRAG pipelines that connect your data and surface clearer insights. Learn more.
Storyblok: Storyblok is a headless CMS built for developers who want clean architecture and full control. Structure your content once, connect it anywhere, and keep your front end truly independent. API-first. AI-ready. Framework-agnostic. Future-proof. Start for free.
Algolia: Algolia provides a managed retrieval layer that lets developers quickly build web search and intelligent AI agents. Learn more.
