Deprecating our AJAX crawling scheme

Deprecating our AJAX crawling scheme(googlewebmastercentral.blogspot.com)

144 points by antichaos 10 years ago | 77 comments

m0th87 10 years ago |

Don't believe the hype. Google has been saying that they can execute javascript for years. Meanwhile, as far as I can see, most non-trivial applications still aren't being crawled successfully, including my company's.

We recently got rid of prerender because of the promise from the last article from google saying the same thing [1]. It didn't work.

1: http://googlewebmastercentral.blogspot.com/2014/05/understan...

thoop 10 years ago | |

Todd from Prerender.io here. We've seen the same thing with people switching to AngularJS assuming it will work and then coming to us after they had the same issue.

[1] This image is from 2014, when Google previously announced they were crawling JavaScript websites, showing our customer's switch to an AngularJS app in September. Google basically stopped crawling their website when Google was required to execute the JavaScript. Once that customer implemented Prerender.io in October, everything went back to normal.

Another customer recently (June 2015) did a test for their housing website. They tested the use of Prerender.io on a portion of their site against Google rendering the JS of another portion of their site. Here are the results they sent to me:

Suburb A was prerendered and Google asked for 4,827 page impressions over 9 days Suburb B was not prerendered and Google asked for 188 page impressions over 9 days

We've actually talked to Google some about this issue to see if they could improve their crawl speed for JavaScript websites since we believe it's a good thing for Google to be able to crawl JavaScript websites correctly, but it looks like any website with a large number of pages still needs to be sceptical about getting all of their pages into Google's index correctly.

1: https://s3.amazonaws.com/prerender-static/gwt_crawl_stats.pn...

grey-area 10 years ago | | |

Perhaps this could be down to response times too, they might crawl much quicker if given static HTML very quickly?

What were the page render times for the two types of page?

walshemj 10 years ago | | |

Interesting will have to reinvestigate this

mixonic 10 years ago | |

I noticed our companies Ember.js based SPA site not being indexed well until I added a sitemap. Then it quickly appeared in the rankings.

Historically Google has been using some fork of Chrome 10 when indexing. I'm unsure what impact that is having on the reliability of app rendering, but I also trust the Google search team has done reasonable checks ensuring common sites and frameworks render correctly.

I strongly suggest using a sitemap for JS rendered sites, based on my own experience.

tracker1 10 years ago | | |

It's also worth noting that even when they do get JS delivered content, it updates much less frequently...

Then again, I really like react+redux+koa (r3k) for client-server rendering.... Hoping to do something more serious with it in the next few months at work.

cotillion 10 years ago |

So they're actually evaluating all js and css Googlebot is consuming. That's insane.

Can we forget about any new competitors in search engine land now? Not only do you have to match Google in relevance you'll actually have to implement your own BrowserBot just to download the pages.

rdoherty 10 years ago |

Wow, I built a project that rendered JS built webpages for search engines via NodeJS and PhantomJS. Rendering webpages is extremely CPU intensive, I'm amazed at the amount of processing power Google must have to do this at Internet scale.

I really hope this works, lots of JS libraries expect things like viewport and window size information, I wonder how Google is achieving that.

a2tech 10 years ago |

This is good-one of my current projects for a customer is entirely AJAX/JS rendered and we were worried that Googlebot would have a fit with it.

greglindahl 10 years ago | |

You should still be worried. Just because googlebot expensively evaluates JS for some websites doesn't mean it will evaluate JS for your brand-new website. You might get crawled a lot less deeply than if you had good content in your static pages.

chc 10 years ago | | |

By abandoning their AJAX crawling scheme as described in the OP, they are essentially saying that they will evaluate JS for all sites. Do you have some reason to doubt that?

devNoise 10 years ago | |

About a year ago I wrote a post[1] about how I couldn't get google to index my AngularJS app. My main problem was the interaction between googlebot and the S3 server. I'll have to go back and test if the crawler's behavior will render the correct content.

1 - https://medium.com/@devNoise/seo-fail-figuring-out-why-i-can...

tracker1 10 years ago | | |

Do you have a sitemap.xml for common routes.. also is your angular app actually doing routing (hash based or push state)?

iwilliams 10 years ago | |

We recently built a site for a customer in Ember and their SEO guys were concerned about indexing. I wasn't sure how it was going to work out, but in the end Google has been able to index every page no problem.

a2tech 10 years ago | | |

Do you know if they sent Google a sitemap? Our client is insisting on a sitemap that has pointers to every-single-product. Something on the order of 2MM+ product pages. It seems like a bit much to me

hanniabu 10 years ago | | |

Sorry if this is a stupid question as this is outside my field of work, but how can you tell if your page has been successfully indexed or not?

gildas 10 years ago | | |

How many pages does the site have?

espeed 10 years ago |

This was the missing piece for Polymer elements / custom web components. Now that Google has confirmed it's indexing JavaScript, web-component adoption should take off.

tracker1 10 years ago | |

I want to like polymer/web-components... I just find that it kind of flips around the application controls that redux+react offers. I'm not sure that I like it better in practice.

eurokc98 10 years ago |

Gary Illyes @goog said this was happening Q1 this year, and like others mentioned lots of other direct/indirect signals have pointed this way.

http://searchengineland.com/google-may-discontinue-ajax-craw... March 5th: Gary said you may see a blog post at the Google Webmaster Blog as soon as next week announcing the decommissioning of these guidelines.

Pure speculation but interesting... The timing may have something to do with Wix, a Google Domains partner, who is having difficulty with their customer sites being indexed. The support thread shows a lot of talk around "we are following Google's Ajax guidelines so this must be a problem with Google". John Mueller is active in that thread so it's not out of the realm of possibility someone was asked to make a stronger public statement. http://searchengineland.com/google-working-on-fixing-problem...

nostrademons 10 years ago | |

I'm betting that they finally solved the scalability problems with headless WebKit. Google's been able to index JS since about 2010, but when I left in 2014, you couldn't rely on this for anything but the extreme head of the site distribution because they could only run WebKit/V8 on a limited subset of sites with the resources they had available. Either they got a whole bunch more machines devoted to indexing or they figured out how to speed it up significantly.

tracker1 10 years ago | | |

I'd say both are pretty likely.. another round of lower-power servers with potentially more cores... more infrastructure... Combined with improvements in headless rendering pipelines. I haven't looked into it in well over a year now, but last I checked dynamic updates took about 2-3 days to get discovered vs. server-delivered being hours for a relatively popular site.

I'm guessing they've likely cut this time in half through a combination of additional resources, and performance improvements. Wondering if they'd be willing to push this out as something better than PhantomJS... probably not as it's a pretty big competative advantage.

I know MS has been doing JS rendering for a few years, they show up in analytics traffic (big time if you change your routing scheme on a site with lots of routes, will throw off your numbers).

nailer 10 years ago |

Currently I use prerender.io and this meta tag:

    <meta name="fragment" content="!">

I don't actually use #! URLs, (or pushstate, though I might use pushstate in the future) but without both of these Google can't see anything JS generated - using Google Webmaster Tools to check.

Does this announcement mean I can remove the <meta> tag and stop using prerender.io now?

thoop 10 years ago | |

If Google Webmaster Tools is unable to render your website correctly, then that's a good indicator that Googlebot won't be able to render the pages correctly either. If you remove the fragment meta tag, then Google will need to render your javascript to see the page. Let us know how that goes if you try it! todd@prerender.io

rgbrgb 10 years ago | |

We have a similar setup and were wondering the same thing (though we use push state). Today we were actually trying to figure out a workaround for 502s and 504s that google crawler was seeing from prerender. We just took the plunge and removed the meta tag because over 99% of our organic search traffic is from google. Fingers crossed!

thoop 10 years ago | | |

I'd love to help here if I can. I'd also love to hear the results of you removing the meta tag! todd@prerender.io

shostack 10 years ago |

Any idea how related this might be to Wix sites getting de-indexed?[1]

http://searchengineland.com/google-working-on-fixing-problem...

rcconf 10 years ago |

This might be obvious to anyone who has done SEO, but can Googlebot index React/Angular websites accurately? I was always under the impression that the isomorphic aspect of React helped with SEO (not just load times.)

vbezhenar 10 years ago | |

If a modern browser can render your site accurately, then Google can index it.

tracker1 10 years ago | | |

It's always lagged in my experience... I'm hoping this announcement means that lag is under a day instead of the 2-3 it was a bit over a year ago.

jwr 10 years ago |

Finally. It was obvious we would have to get to that point eventually, it just wasn't clear when.