Close

Browse by...

Dates click to expand

2008
3
1
2

Jquery History Plugin With Regular Expressions

Over the past couple months, I have delved into the world of jquery, which, for the uninitiated, is the framework for developing javascript in today's world of web development. Jquery shortens or eliminates many of the more tedious aspects of HTML's dom framework, while allowing elegant access to the fancier effects of javascript. One of the other coolest parts of this framework is the growing number of plugins built by the community.

The plugin that I want to bring up is the history plugin by Klaus Hartl, the original of which can be found here. Javascript usually has problems with the back button, but here it is integrated smoothly, and without forcing the page to do an ugly reload. As an added bonus, the plugin also allows the creation of links that are tied to an internal javascript state. Basically, javascript would now have the abilities of a full-blown 90s style html-only webpage, with the smoothness and efficiency of not requiring any page reloads. A good example of a similar javascript history module in action is Gmail, where the back button works smoothly, and you are able to copy and paste the url to a different window.

My slightly modified version of the script can be found here. This potentially interesting enhancement allows the use of a regular expression when defining the javascript internals for the history plugin. It is useful when the site has additional user-specific variables, such as an id or an object name or number, and the programmer has no desire to explicitly code every possibly imaginable scenario. Regular expressions are very powerful tools.

To use the modified version of the module, use a snippet like this one, substituting a regex for where the history module would normally require a string:

    $.ui.history('add', '(\\d+)/analyze/queries', function(url, pg) {
        set_phrasegroup(pg);
        //do something something
    });

Note the slight irregularity where you need to escape each slash with another slash.

The first parameter, url as passed into the function is the entire string that the user inputed. It would look something like 3/analyze/queries, while pg, any other further passed parameters refer to any of the matched groups within the regex. In this case, we only have one, (\d+), a number.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

1 comment


December 29, 2008
Vimal Shyamji said:

Hi Sal, I found your blog from the Juice Analytics website and wanted to reach out to you because of your obvious passion for jquery. Another client of mine called Blank Slate is using jquery as well (www.widgetthing.me), and the CTO there is also really into it. The purpose of my reaching out to you and Juice was this....I place tech consultants with companies ranging from really small startup's through mid-size companies and Juice looks like the type of company we're usually successful at helping. Anything you can share with me about whether or not Juice ever brings people in for projects and who would handle that process if that did come up?

Thanks in advance....Vimal (617-880-3216)http://www.linkedin.com/myprofile?trk=hb_side_pro

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Programmatic Google Trends API

Yesterday, Google released an update to their popular Google Trends tool. There are improvements over the previous version, but the biggest new feature is a new shiny button that lets you download all your data in the format of a CSV. This is a very cool enhancement. Where Google Trends was a geeky toy, it now takes the leap to integrate into analysts' reports and with that, edge its way onto managerial desks.

This python module is a quasi-API to make it easier to authenticate into Google Trends for those who want to squeeze the extra level of functionality out of their data. The advantage of programmatic access is that the data can be automatically trended and merged. It can be snuck into a 9:00 AM daily email to the VP of Marketing so that she knows to ramp up Google Adwords campaigns for some specific keyword. Also, by programatically pulling multiple reports, it is possible to create a wealth of data not visible in a single report. Using one keyword as a benchmark to merge multiple reports, we can do a meaningful comparison on tens or hundreds of relevant keywords.

To use the pyGTrends, the quasi-Google-Trends-API, download the file from our server.

Here is an example of the most basic basic report that you can pull down from Google Trends. The connector function needs authentication info, and download_report needs to be passed a list of keywords.

from pyGTrends import pyGTrends

connector = pyGTrends('google username','google password')
connector.download_report(('keyword1', 'keyword2'))
print connector.csv()

You can, however, use pyGTrends to get any slice of data that you can pull down from Google Trends. To see the exact parameters that you should use, go to Google Trends, and navigate to the specific sufficiently-narrow report that you are interested in. Then, right-click on the CSV download, and save the link location. The different parameters should be discernible from the link. The following code downloads a report for banana, bread, and bakery keywords from April 2008, originating from the magnificent nation of Austria, and scaled using fixed scaling (aka the second download link).

connector.download_report(('banana', 'bread', 'bakery'), 
                          date='2008-4', 
                          geo='AT', 
                          scale=1)

By default, the csv() function downloads the main part of the report, but there are a few additional parts stuck to the bottom of the CSV file. If you are interested in those, pass the section parameter to the csv() function. If you do not want column headers on your data, you can also pass the column_headers parameter as false. The following will return the Language section without any headers.

print connector.csv(section='Language', column_headers=False)

Here is a snapshot from the new Google Trends to add some eye-candy to the post: Google Trends Eye-Candy

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

10 comments | Show all comments only the last 5 are shown


July 21, 2008
Arjun said:

I have been using the pyGTrends module and have encountered problems when using keywords with more than one word. For instance, "air express" was one of the keywords. It has a search history--when I manually download the data from Google Trends, the historical data shows up fine. However, when I use the pyGTrends module, the data comes out as all zeroes.

The same problem occurs with all keywords/phrases that contain more than one word. Is pyGTrends compatible with only one-word keywords, and if not, how do I fix this problem?

If someone could email me, arjunrmodi at gmail dot com, that would be great. Thanks.


July 21, 2008
A said:

Great module.


July 21, 2008
Sal Uryasev said:

Arjun: It should work now. Thanks for pointing it out!


November 6, 2008
CurtD said:

Hi,

I am wondering if you Google Gurus know of how to attack a project I am trying to accomplish. I want to be able to run a script, query, software that can give me the answer to something like the following.."I want to know what search terms have been searched 1500 a day on average and have less than 200,000 results". The 1,500 and 200,000 are variables that can be changed. It would be cool to be able to have a drop down menu to say "last 30 days" "last 90 days" etc. I have posted this project on Elance and Guru and I have mostly gotten "never heard of it, can't be done." Some talked of building a library through API and then running queries on it. Does this query exist or how can I go about getting it done?


November 6, 2008
Sal Uryasev said:

Hey Curt,

The problem lies with the fact that Google Trends does not have any absolute numbers. You can only get results as they are relative to other results.

It appears to be an intentional omission. The only way to get at that data could be to get hired as a Google employee.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





How Did We Mash Data into Google Analytics?

This post is the code behind how we mashed external data into Google Analytics.

The first step is to yank reference data from the Google Analytics site to reference against Kampyle's data. We specifically want to gather individual names of websites (index.html, /index2.html), and the current selected daterange. The cell references to the website names in the table can be found using a neat Javascript Shell popular among Greasemonkey and Javascript developers. I will not go into detail about the Javascript Shell, but by checking out the various child nodes for the table object we can track down that document.getElementById('f_table_data').childNodes[3].rows[1].cells[1].textContent points at the text in the first cell of the first row. While the syntax looks long, it is just nested HTML in a more elegant programmatic fashion.

For the date, Google Analytics uses a slightly peculiar hybrid system where the date is drawn initially from the URL, but if the date is modified with the java date tool in the upper right hand corner, it uses that instead. From our end, document.getElementById('f_primaryBegin').value and document.getElementById('f_primaryEnd').value are the java date tool values that only start existing if the date tool is used. Pull these two values if they exist, and simply parse the date from the URL otherwise.

The clickable tab we created is essentially the equivalent of a little Greasemonkey button with a few frills that can be created in the standard Greasemonkey fashion. Wherever possible, I use Google-defined layouts for consistency with the site.

Next, we want to send out our reference data to some external server. Greasemonkey has good functionality for pulling data from other sites and servers through the use of the GM_xmlhttpRequest command. A server-end PHP or Django service might be easiest to implement. In this specific example, Kampyle wanted to use the SOAP protocol. While there is an excellent overall SOAP client for javascript by Matteo Casati, this client does not work in a plug and play fashion with Greasemonkey, and needed some modification. For any devoted SOAPers who want to try Greasemonkey, the revised javascript-soap-client code can be found in the attached file. We use the SHA256 encryption function written by Angel Marin and Paul Johnston, but that is accomplished by just copying and pasting the function into our code.

The result comes back in the form of an xml object describing each row in the table, which we parse using native Javascript/Greasemonkey methods, and pop back into the table in the way that we extracted the individual website names. A neat trick here is to call each individual row individually, and not to wait for the data to come back before calling the next row from the server. Separate listeners can wait and insert the data at their leisure. This allows our page to load up faster, and in case there is an error with one data element, it could potentially allow the rest of the rows to load in peace.

You can play around with my code here. This code is released under the BSD License. You won't be able to run the code verbatim without Kampyle's compliance, since they have changed the API calls on their server. However, much of it should be very portable to other data sources.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

2 comments


September 13, 2008
Gotham said:

This post have great information but I needed a few clarifications. For one of my clients I have usual web analytics info displayed in GA. Additionally the client has call tracking data in its own database. Can I pull that info into GA in a new tab? Your mashup indicates that you added a new tab called "Kampyle", are the names of the table which shows up configurable? (e.g URL, avg grade)


September 16, 2008
Sal Uryasev said:

Hey Gotham,

Yes - as long as you have easy access to the data, you can push any data that you want into Google Analytics. If the data is completely static, you can even add it to the script. Alternatively, you could have a hosted file somewhere. In our case, the data was very dynamic, so we used a server with another web service to fetch the data.

If you click on the picture above, it'll show you the entire table, including column names that I changed around. Essentially, you have the power to change any text that you can select by a mouse. It is just a matter of knowing where to point your code.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Juiced Google Analytics Python API

It is not official. It is not from Google. It is, however, very functional and very here. I present to you pyGAPI, the Juiced Google Analytics Python API. This module allows you to pull information from your incarnation of Google Analytics and employ it programatically into your reporting code.

Let us use iPython to peek through some code using pyGAPI.

In [3]: from datetime import date
In [4]: import pyGAPI
In [5]: connector = pyGAPI.pyGAPI(username, password, website_id="1234567")

Here we create a pyGAPI object. Behind the scenes, pyGAPI logs into Google Analytics, and downloads an identifier cookie. website_id is optional. If omitted, pyGAPI accesses the first website on the account's list. To get a list of all the site IDs to which your site has access, run the function connector.list_sites().

In [6]: connector.download_report('KeywordsReport', (date(2008,3,10), date(2008,3,31)), limit=5)

Download a report into your pyGAPI object. KeywordsReport is the name of the report. It is followed by a tuple containing the start and end dates in python date format. limit is an optional parameter that specifies the number of entries that pyGAPI should pull down. By default, it will pull in all the entries up to a maximum of 10000. Lowering this number will certainly improve performance. The entries returned are ranked by Visits, so you should get the most significant values of the bunch.

In [7]: print connector.csv()
Keyword,Visits,Pages/Visit,Avg. Time on Site,% New Visits,Bounce Rate,Visits,Subscribe,Solutions,Goal Conversion Rate,Per Visit Goal Value
juice analytics,356,5.935393258426966,314.061797752809,0.38764044642448425,0.29494380950927734,356,1.0,0.16292135417461395,1.1629213094711304,0.0
excel training,142,1.971830985915493,98.0774647887324,0.908450722694397,0.6901408433914185,142,1.0,0.0211267601698637,1.0211267471313477,0.0
excel charts,77,1.7922077922077921,95.0,0.9090909361839294,0.7792207598686218,77,1.0,0.03896103799343109,1.0389610528945923,0.0
excel skills,72,1.6527777777777777,75.29166666666667,0.9444444179534912,0.7083333134651184,72,1.0,0.0,1.0,0.0
colbert bump,70,1.3142857142857143,113.77142857142857,0.6428571343421936,0.8428571224212646,70,1.0,0.0,1.0,0.0

This function displays your report in a nice excel-ready CSV format.

In [8]: print connector.parse_csv_as_dicts(convert_numbers=True)
[{'Avg. Time on Site': 314.06179775280901, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.29494380950927734, 'Keyword': 'juice analytics', 'Visits': 356.0, 'Pages/Visit': 5.9353932584269664, 'Subscribe': 1.0, 'Solutions': 0.16292135417461395, '% New Visits': 0.38764044642448425, 'Goal Conversion Rate': 1.1629213094711304}, {'Avg. Time on Site': 98.077464788732399, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.69014084339141846, 'Keyword': 'excel training', 'Visits': 142.0, 'Pages/Visit': 1.971830985915493, 'Subscribe': 1.0, 'Solutions': 0.021126760169863701, '% New Visits': 0.90845072269439697, 'Goal Conversion Rate': 1.0211267471313477}, {'Avg. Time on Site': 95.0, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.77922075986862183, 'Keyword': 'excel charts', 'Visits': 77.0, 'Pages/Visit': 1.7922077922077921, 'Subscribe': 1.0, 'Solutions': 0.038961037993431091, '% New Visits': 0.90909093618392944, 'Goal Conversion Rate': 1.0389610528945923}, {'Avg. Time on Site': 75.291666666666671, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.70833331346511841, 'Keyword': 'excel skills', 'Visits': 72.0, 'Pages/Visit': 1.6527777777777777, 'Subscribe': 1.0, 'Solutions': 0.0, '% New Visits': 0.94444441795349121, 'Goal Conversion Rate': 1.0}, {'Avg. Time on Site': 113.77142857142857, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.84285712242126465, 'Keyword': 'colbert bump', 'Visits': 70.0, 'Pages/Visit': 1.3142857142857143, 'Subscribe': 1.0, 'Solutions': 0.0, '% New Visits': 0.6428571343421936, 'Goal Conversion Rate': 1.0}]

This function goes the extra step and converts the CSV into a dictionary for easier programmatic use. By default, all entries will be returned as python strings. Setting convert_numbers to True, as we did here, will additionally parse the dictionary to turn all numbers into float values.

In [9]: print connector.list_reports()
('ReferringSourcesReport', 'SearchEnginesReport', 'AllSourcesReport', 'KeywordsReport', 'CampaignsReport', 'AdVersionsReport', 'TopContentReport', 'ContentByTitleReport', 'ContentDrilldownReport', 'EntrancesReport', 'ExitsReport', 'GeoMapReport', 'LanguagesReport', 'HostnamesReport', 'SpeedsReport')

This gets a list of all the reports that I have successfully tested thus far. All site-specific reports should work. A couple site-section specific reports should be included in the next update of pyGAPI.

Google is great and will release a real API soon, but until then you can download pyGAPI.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

11 comments | Show all comments only the last 5 are shown


June 21, 2008
Matt Webb said:

This is awesome work. Do you think this python script could work in conjunction with superkaramba on Linux?


June 27, 2008
Rodrigo said:

This is great. I put this together with a Samurize desktop to display Analytics data on my desktop.
Thanks!


August 7, 2008
Ludovic said:

Very nice work. Very useful to, let's say get your most visited pages without having to maintain parallel accounting. May I ask you to licence it to an OSS licence and put it on Google Code ? Would be great.


August 20, 2008
Sebastian said:

Hello,

it work well! Great.
How can i pull the "keyword" or "country" report for a specific URL?
(use segmention)

Thanks


September 5, 2008
Thierry said:

Awesome work !

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Setting DJANGO_SETTINGS_MODULE

Here's a bash function I use for Django development to quickly set DJANGO_SETTINGS_MODULE.

function setdsm() { 
    # add the current directory and the parent directory to PYTHONPATH
    # sets DJANGO_SETTINGS_MODULE
    export PYTHONPATH=$PYTHONPATH:$PWD/..
    export PYTHONPATH=$PYTHONPATH:$PWD
    if [ -z "$1" ]; then 
        x=${PWD/\/[^\/]*\/}               
        export DJANGO_SETTINGS_MODULE=$x.settings
    else    
        export DJANGO_SETTINGS_MODULE=$1 
    fi

    echo "DJANGO_SETTINGS_MODULE set to $DJANGO_SETTINGS_MODULE"
}

I put this in my .bash_profile, then a quick setdsm sets the DJANGO_SETTINGS_MODULE to the settings.py in the current directory and add the current directory and it's parent to PYTHONPATH.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

3 comments


October 16, 2008
Ro said:

Hi Chris,

1. i put the script into .bash_profile, which i had to create first
2. then i opened the konsole, entered into my projects dir which is on a windows drive (sth like /media/sda2/Document .../..),
3. called `django-admin.py runserver`

but i got the error "Settings cannot be imported because environment variable DJANGO_SETTINGS_MODULE is undefined"

any suggestion what might be the problem?


November 20, 2008
Boffo Bob said:

"windows drive" is the problem.

wtflol


November 21, 2008
Not Boffo Bob said:

Works - excellent.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Earlier writing