Welcome to openkapow Sign in | Join
in Search
.

Many sites are implementing Anti-Scraper/Anti-Mashup techniques

Last post 09-30-2008, 2:29 PM by Klaus_Kapowtech. 1 replies.
Sort Posts: Previous Next
  •  09-25-2008, 11:50 AM 21330

    Many sites are implementing Anti-Scraper/Anti-Mashup techniques

    Kapow works fine on digg and craigslist. But I have discovered that many sites such as Guess.com and RalphLauren. Have different behavior when using Kapow.

     For instance an easy page load from Guess.com such as...

    http://shop.guess.com/ProductListing.aspx?page=LIST&browse=1&root_category|46=Women&category|46=Women&rpt=Department.aspx&pt=NewArrivals.aspx&max_days_verified=-14&answers_per_page=200&sort_option=New%20Arrivals

    Will instantly give you Page load error "invalid query". They are using the vertical bar "|" which is an unsafe character in the url. The browser accepts it but openkapow does not pass it along properly.

    www.RalphLauren.com for instance. Has some sort of measure to refuse images from loading when using Kapow.

    Such as...
    http://www.ralphlauren.com/family/index.jsp?categoryId=2047536&cp=2048081&ab=ln_baby_cs2_layettegirl

     Is OpenKapow support guys aware of this?

     

     

     

     

     

     

  •  09-30-2008, 2:29 PM 21369 in reply to 21330

    Re: Many sites are implementing Anti-Scraper/Anti-Mashup techniques

    In general they should not be able to detect scraping except based on the request pattern. It is true that pipe | is not allowed in URL, but IE and FireFox are linient and will probably just encode it to %nn which should have be done by the host. I Kapow it is usually considdered a bug if a page doesn't load currectly (although some Ajax and Javascript is not supported yet). Since OpenKapow users don't have access to the Enterprise support channel, you often have to find a workaround. In most cases you can work around URL error by doing the encoding inside the robot, either extract and replace the invalid chars before loading the page, or use a Page Change inside the page loader. 

View as RSS news feed in XML
.
Copyright 2006, 2007 KapowTech.com All Rights Reserved Company | Contact | Terms | Privacy