Our team made a list of the most valuable technology companies, and added companies as they started to trend in the news and on HackerNoon. The first one and half thousand were public companies based on market cap. Then as companies got mentioned in HackerNoon stories and performed well in our startup of the year voting, we added created tech company news pages for them. Once a tech company news page is created, our system curates and stores the trending news, articles and blog posts about that company based on our rules and prompts that define what is a trending story.
A combination of custom rules, prompts and conditions for relevance, specificity and trendiness using the Bing News API, the Brave News API, and the HackerNoon API. We drilled down industry match for each company, and heavily favored more trusted high ranking sites while also allowing for relevant lower ranking niche publishers. For each company, we surface the most relevant 10-20 stories on their main /company page (Microsoft as an example), and then feature the complete list of the company’s news, stories, mentions, articles and notable links in internet history on company-name/news (Google as an example).
The columns are companyName, company URL, publishedAT, (story) url, title, featured image, and (meta) description. This follows how we organize data in our database. Every article is connected to at least one company. Some companies have more articles than other based on their share of voice, for example using the dataset viewer you can see Google has 99,152 results, 3M has 20,608 results, Adobe has 13,449 results, and NVIDIA has 19,811 results.
Without even downloading the data, you can search for company or publication names in the dataset viewer, like NVIDIA pictured below:
This dataset is open sourced under the Creative Common License on HuggingFace as Tech Company News Data Dump. Please use this tech company news data freely for your project :-) You could quantify a company’s aggregate share of voice online, you could measure sentiment analysis of a company’s digital news coverage, you could train your model to predict what headlines will publish about what companies in future, or whatever other research about large tech companies and media coverage your heart desires.