- Machine Learning Solutions
- August 12, 2021
Automated Data Extraction and Segregation for Government Departments
Government departments seeking efficient monitoring of news published about them in various news feeds.
To automate the process of scraping and extracting data from multiple websites, categorize the data by departments, and deliver relevant news to respective government departments.
1. Data Collection:
- Collected URLs and edition names of news sources.
- Obtained permission from private editions.
- Manual segregation of data.
- Applied filters to identify reserved keywords.
- Trained an ML algorithm using the keywords.
2. Prediction and Delivery:
- ML algorithm identifies relevant news feeds and departments interested in the news.
- News feeds are provided to the algorithm for inference.
- Algorithm identifies news related to different departments.
- Delivers relevant news to official email IDs of related departments (e.g., secretariat, cabinets, officials).
3. Data Gathering:
- Employed Python pre-built libraries for web scraping.
- Machine crawls information from different websites.
- News items are stored in a database for further processing.
4. Data Segregation:
- After collecting sufficient data, traditional machine learning algorithms identify keywords pertinent to various departments.
- Relevant information is shared with the respective departments.
- The system scrapes and delivers the latest updated information every hour.
- Data is based on reserved keywords and sent to the appropriate departments.
- Automation eliminates the need for manual monitoring, saving time and effort for government departments.
- Efficient monitoring of news published about government departments.
- Timely delivery of relevant information to the concerned departments.
- Automation reduces manual effort and increases productivity.
- Improved awareness and responsiveness to news and public perception.