BeautifulSoup library was used to post process the raw html response. We used a selenium based crawler with a headless server (phantomjs) for crawling. This was one of the most laborious and frustrating tasks as we faced a lot of forbidden issues and captcha but this was a great learning step for us as we came across a lot of workarounds and things that we can face while scrapping data and processing it. Task I -: We had to crawl various product information from walmart website to build our product database. Inspired by the search engine challenge and wanting to try our hand at a real-world dataset we took on the Walmart Datathon challenge. Walmart-Product-Search-Engine Project Demo video: A search engine to look up products listed in.
0 Comments
Leave a Reply. |