Project Details
- Machine Learning Solutions
- August 12, 2021
Automated Data Extraction and Segregation for Government Departments
Client:
Government departments seeking efficient monitoring of news published about them in various news feeds.
Challenge:
To automate the process of scraping and extracting data from multiple websites, categorize the data by departments, and deliver relevant news to respective government departments.
Solution:
1. Data Collection:
- Collected URLs and edition names of news sources.
- Obtained permission from private editions.
- Manual segregation of data.
- Applied filters to identify reserved keywords.
- Trained an ML algorithm using the keywords.
2. Prediction and Delivery:
- ML algorithm identifies relevant news feeds and departments interested in the news.
- News feeds are provided to the algorithm for inference.
- Algorithm identifies news related to different departments.
- Delivers relevant news to official email IDs of related departments (e.g., secretariat, cabinets, officials).
3. Data Gathering:
- Employed Python pre-built libraries for web scraping.
- Machine crawls information from different websites.
- News items are stored in a database for further processing.
4. Data Segregation:
- After collecting sufficient data, traditional machine learning algorithms identify keywords pertinent to various departments.
- Relevant information is shared with the respective departments.
Results
- The system scrapes and delivers the latest updated information every hour.
- Data is based on reserved keywords and sent to the appropriate departments.
- Automation eliminates the need for manual monitoring, saving time and effort for government departments.
Benefits:
- Efficient monitoring of news published about government departments.
- Timely delivery of relevant information to the concerned departments.
- Automation reduces manual effort and increases productivity.
- Improved awareness and responsiveness to news and public perception.