Automatic news alert system using traditional machine learning

Automatic news alert system using traditional machine learning

Use case:

It is an automated process which scrapes and extracts data from the various websites and needs to collect data by departments. This is to help government departments be aware of  news being published about them in different news feeds. An ML algorithm was trained to identify relevant news feeds and relevant departments that would be interested to know about the news published. At inference time, ML algorithm is provided various news feeds from which it identifies news that are related to different departments and delivers them to official  mail ids of related departments like secretariat, cabinets, and other officials. The final output of this process is the machine scraping the latest updated information for every hour based on the reserved keywords and mail them to their respective departments.

Steps involved for collecting initial data:
  1. Collected websites URL’s and edition names of news sources.
  2. Permission from the proper edition owner (if it was private).
  3. Manual person for Segregation.
  4. Filter some reserved keywords
  5. Train the model with the keywords
  6. Prediction with above model
  7. Results sent in the email format
Data gathering:

Nowadays gathering the complete information is a very hard task. Machine needs to crawl the information from different websites initially using some python pre-built libraries. A program can webscrape the news items and then store those news in the database.

Data Segregation:

Once the machine collected sufficient data, using some traditional machine learning algorithms it can identify a few keywords pertinent to various departments and finally share the information with the relevant departments.