• Stop being a LURKER - join our dealer community and get involved. Sign up and start a conversation.

Inventory protection Idea that doesn't work

@Alexander Lau

I don't think dealers will have the time to chase these sites business t the point is that if enough companies successfully win these type of case in court the precedent may make it easier to file a quick injunction to close a scraping site therefore making it harder for these guys to pop up and harass small biz.
agree. the issue is being able to determine who is using scraping techniques (there are some sneaky groups out there). scraping can be used effectively (legally, depending on website policy) and it can be used fraudulently. if you want to talk about innovation, there lies a business model to be had or created. I've not seen a scraping police model, YET.
 
Last edited:
PhantomJS combined with CasperJS is pretty fantastic - it runs a full, headless copy of a Webkit browser so it can operate against a real DOM, execute JavaScript properly, even grab full rendered screenshots of areas of the page but is still easy to automate.

Using headless browser solution is very expensive actually (system resources wide) comparing to standard scraping way. And from my experience 99% of sources can be scraped without JS execution. You just need to find out how page works, what it loads, how data is rendered. Moreover using headless browser loading seriously not only your server/computer but also source server, as single page call may cause hundreds of calls to images, javascript files, css files etc. Another con of using headless browser is you are executing google analytics script and your call gets into stats and most likely as bounced visit (unless you sit on page some time). In my opinion, if you scraping someone, you should do it as much gently as you can so no harm is caused to the source.
 
  • Useful
Reactions: Alexander Lau
Using headless browser solution is very expensive actually (system resources wide) comparing to standard scraping way. And from my experience 99% of sources can be scraped without JS execution. You just need to find out how page works, what it loads, how data is rendered. Moreover using headless browser loading seriously not only your server/computer but also source server, as single page call may cause hundreds of calls to images, javascript files, css files etc. Another con of using headless browser is you are executing google analytics script and your call gets into stats and most likely as bounced visit (unless you sit on page some time). In my opinion, if you scraping someone, you should do it as much gently as you can so no harm is caused to the source.
Sweet, I like your tool, A LOT!
https://www.diggernaut.com