A news event is a specific incident, topic or issue that leads to news coverage. They are more fine-grained than a news story or a news category, and also more impressive in that they are covered by multiple outlets (see Figure 1).
Determining which are the most newsworthy events is a difficult task for journalists and news users alike. This is largely due to the fact that they are often hard to define, and even more so because events can be highly contested. For example, it is not uncommon for one politician to give a speech and then have his or her parliamentary colleague react to that speech in turn.
The best news event identification systems use a combination of machine learning techniques to learn what articles are related by using both textual and graphical features. For instance, it might be able to learn that two articles about the same issue in different linguistic styles are linked by the cosine similarity of their tf*idf representations.
Although this method enjoys high precision across a range of thresholds, it can be prone to false positives and misses a few articles that really should be considered to belong to the event – for example, an interview that events 3 news accompanies a news article about the same topic.
To get around this problem, we have created a more efficient and accurate version of the cosine measure with word embeddings. Compared to the standard cosine measure, softcosine is significantly faster and has a smaller loss of precision. Moreover, it can be used to identify the most important and most prestigious news stories in an automated way. In addition to the most interesting news stories, this method can be used to detect which articles are relevant and which are less relevant to your audience.