What I built
I built a web scraper using Python (specifically the Scrapy Framework and Beautiful Soup library), MongoDB Atlas as the database, Atlas Charts for data visualization and Africas Talking as the SMS API. The web scraper scrapes the NSE website for the latest stock prices and stores them in a MongoDB Atlas database. The data is then retrieved from the database and sent to the user via SMS.
Additionally, I added CI/CD to the project using GitHub Actions. The CI/CD pipeline runs the tests and lints the code. It tests the project against python versions: 3.8, 3.9 and 3.10 as well MongoDB versions: 4.4, 5.0, and 6.0. This ensures wide compatibility with different versions of Python and MongoDB.
Category Submission
Choose Your Own Adventure
App Link
Screenshots
Description
NSE Stock Scraper is a web scraper that scrapes the NSE website for the latest stock prices and stores them in a MongoDB Atlas database. It is meant to be the ultimate data collection tool using the Open Source tools.
Link to Source Code
Permissive License
Background
I have recently been learning more about financial markets and data analysis. I wanted to build something that would help me learn more about the stock market and also help me practice my web scraping skills. I also wanted to learn more about MongoDB Atlas and how to use it to store data. Among my goals was to utilize Africas Talking to send notification SMS messages.
How I built it
I built the web scraper using Python and the Scrapy Framework. I used Beautiful Soup to parse the HTML data. I used MongoDB Atlas to store the data. I used Atlas Charts to visualize the data. I used Africas Talking to send the data to the user via SMS.
Scrapy as a framework enforces certain ways to structure your code, which makes it easier to maintain and extend. It also has a lot of built-in features that make it easy to do things like pagination, following links and storing data. Using MongoDB Atlas was easy as it has a free tier that allows you to store up to 512 MB of data. I used Atlas Charts to visualize the data collected in MongoDB Atlas.
In this case, we created a simple spider for the afx website. The spider is responsible for crawling the website and extracting data from it. The spider is also responsible for following links and crawling other pages. Once data is extracted from the website, it is stored in MongoDB Atlas.
We then use the data in Atlas Charts to generate different types of visualizations and store them in a dashboard. As shown below:
The Atlas dashboard provides alot of useful metrics in real-time about the currently running database(s).
Additional Resources/Info
What's next for the Project?
Fix bugs as well setting up a cron job to send the data to the user at a specific time of the day.
Deploy the project to a cloud hosting platform i.e. Google Cloud, Azure, etc.
Add more features to the project such as sending the data to the user via email.
Containerize the project using Docker.
Let me know what you think of the project. I would love to hear your feedback. Thanks! Feel free to reach out to me on Twitter or LinkedIn.