Scraping reviews from Google Play using JavaScript

April 10Hits:1

What I'm trying to do:

  1. Get user input for Google Play App page.

    e.g. https://play.google.com/store/apps/details?id=jp.scn.android

  2. Scrape 100 reviews from Google Play App and organize them into an array.

My JavaScript function:

function test() {     var urlAdd = document.getElementById('input').value;     var urlEnglish = urlAdd + '&hl=en';      var query = {     url: urlEnglish,     type: 'html',     selector: '[class=review-body]',     extract: 'text'     },     request;      request = 'http://example.noodlejs.com/?q=' +     encodeURIComponent(JSON.stringify(query)) +     '&callback=?';      jQuery.getJSON(request, function (data) {     document.getElementById('output').innerHTML = '<pre>' +     JSON.stringify(data, null, 4) + '</pre>';     }) }; 


  1. This function feels bulky.
  2. It only returns 20 results. My guess is that this has something to do with how the Google Play DOM retrieves reviews.


  1. How do I streamline/improve my scraper function?
  2. How do I get 100 results or more?


My apologies if this is a simple solution. I just started learning JavaScript piecemeal in February.


Use scraping as last resort. I suggest you find available APIs for that. It's more robust, and easy to work with. One issue with scraping is that it fetches the static HTML generated by the url. You can't fetch what is loaded via JS. Although there are ways to do it, it's just adds too much complexity.

In terms of hacking the code, you could check out how Google Play loads the rest of the comments. It should be an AJAX call, check the network tab of your browser. You checkout that url, the response and modify it to your needs.

As for your script: You're using jQuery, use it all the way!

// Cache as well as put configurables here
var input = $('#input');
var output = $('#output');
var noodleUrl = 'http://example.noodlejs.com/?';

var query = {
    // You should check if the url ends with parameters. Otherwise, this fails.
    url: input.val() + '&hl=en',
    type: 'html',
    selector: '[class=review-body]',
    extract: 'text'

// $.param
var request = noodleUrl + $.param({
  q : JSON.stringify(query),
  callback : '?'

$.getJSON(request, function (data) {
    output.html('<pre>' + JSON.stringify(data, null, 4) + '</pre>');

