In the world of Big Data topic recognition and clusterization of multiple documents, web pages, news articles, etc. is extremely helpful in different business applications from search engines to e-commerce portals.
In light of this, we are pleased to announce that today we’ve taken another step towards the expansion of our Intellexer API Service
with the release of a new module – TopicModeling
(in Select Product
is a new categorization solution which classifies documents into predefined categories. Currently, it works with a ready-made set of 57 topics falling into 10 domains – Economics, Entertainment, Environment, Health, Lifestyle, Science, Society, Sports, Tech, and Transport.
In contrast to our network client application Intellexer Summarizer
, which works with a single document at one time and recognizes its topic as an additional feature, TopicModeling is aimed to organize multiple documents by recognizing their topics and categorizing them accordingly into certain categories.
Categorization challenges TopicModeling overcomes:
- Documents that refer to several topics or domains, e.g. Economics and Politics
- Words that have multiple and sometimes referring to different areas meanings
We use state-of-the-art machine learning (ML) algorithms in combination with big data sets and ontological networks to ensure the highest precision of topic recognition.
1. An article on “Scotland’s economic strategy” chosen for categorization by TopicModeling is labeled with such topics as “Economics” and “Politics” by it.
2. Given two texts both abounding with the term "network", TopicModeling classifies them into 2 different categories – “IT” and “Transport” despite the frequency of the word “network” in both of them.
is a perfect solution for managing a large number of documents that can be applied both for personal and business use.
July 9, 2018
Back to Blog Main Page