Setting the initial view of a motion chart in R

Following on from my article about accessing and plotting World Bank data with R I want to talk about how to change the initial view of a motion chart.

Over the last couple of weeks I have been asked a view times how to do this. For instance Stephen O'Grady wanted to create a motion chart, which shows initially a line chart, rather than a bubble chart.

Changing the initial settings of a motion chart is actually quite easy, if you know how to. The trick is to use the state argument in the list of options of gvisMotionChart.

As a case study I will use the World Bank data set and try to do some homework given by Duncan Temple Lang in his course on introduction to statistical computing course. Duncan asked his students to query the World Bank data base to create a line chart, which would show the number of internet users per 1000 in Africa over time. Further, he would like to see a legend next to the chart to identify which country is which and tooltips for each curve to identify the country.

A motion chart, displayed as a line chart, would do the trick.

Okay, getting the data is easy, thanks to the WDI package, or via a direct download, and so it is to create a motion chart with bubbles. Interactively I can change the bubble chart into a line chart, I can select some countries and change the y-axis to log-scale. However, when I reload the page I am back to square one: a bubble chart. So the idea is to pass the changed chart settings on to the initial plot. I find those settings, of the current view, as a string in the advanced tab of the settings window. I click on the wrench symbol in the bottom right hand corner of a motion chart to access this window.

Screen shot the settings window of a motion chart

Next I copy this string and paste it into the state argument of the options list. Note the line break at the beginning and at the end of the state string in the example. Alternatively I can add \n to both side of the state string.

Here is an example, where I pre-selected Sierra Leone and Seychelles (countries with the lowest and highest number of internet users) together with Africa, North Africa and Sub-Saharan Africa (all income levels). You find the R code below to replicate the plot.

What does the data tell you? Play around with the graph, e.g. change it to a column graph, deselect all countries and change the y-axes to linear again, and hit the play button. How could we improve the plot?
Read more »

Accessing and plotting World Bank data with R

Over the past couple of days I played around with the data sets of the World Bank, and I have to admit that I am blown away by it. It is amazing, to see what is available on their web site and it is worth visiting their Data Visualisation Tools page. It is fantastic that they provide an API to their data. They have used it to build an iPhone App which is pretty cool. You can have the world's data in your pocket.

In this post I will show you how we can access data from the World Bank in R. As an example we create a motion chart, in the Hans Rosling style, as you find it on the Google Public Data Explorer site, which also uses data from the World Bank. Doing this, should give us the confidence that we understand the World Bank's interface. You can find this example as demo WorldBank as part of the googleVis package from version 0.2.10 onwards.

So let's try to replicate the initial plot of the Google Public Data Explorer, which shows fertility rate against life expectancy for each country from 1960 to today, whereby the countries are represented as bubbles, with the size reflecting the population and the colour the region.
Read more »

R in the insurance industry

Let's talk about R in the insurance industry today.  David Smith's blog entry reminded me about our poster at the R user conference in Warwick in August 2011:
Using R in Insurance
We presented examples on how R can be used in the insurance industry. We had a lot of fun presenting our poster. By accident we had printed the poster with quite a bit of access white space to the right. So we asked everyone who came along to sign it and by the end of the evening we had over 100 signatures!


For the historians under the readers, here is my five year old poster from GIRO in Vienna 2006.


Poster session at useR! 2011 in Warwick, UK
Yesterday Wayne Zhang, with whom I collaborate on the ChainLadder package, released the first version of his new cplm package on CRAN.  The name cplm is short for compound Poisson linear models. The cplm package is for fitting Tweedie compound Poisson linear models using the Monte Carlo EM algorithm. The form of the models that are handled in the package are generalized linear models, mixed-effect models and Bayesian models. For non-Bayesian models, maximum likelihood estimations are obtained for all parameters in the model, especially for the index parameter. Estimation for the Bayesian model is performed by Markov Chain Monte Carlo simulations. These models find their application in actuarial science, see also his paper.  

Here are a few more insurance related packages:
  • ChainLadder - Reserving methods in R. The package provides Mack-, Munich-, Bootstrap, and Multivariate-chain-ladder methods, as well as the LDF Curve Fitting methods of Dave Clark and GLM-based reserving models.
  • cplm - Monte Carlo EM algorithms and Bayesian methods for fitting Tweedie compound Poisson linear models.
  • lossDev - A Bayesian time series loss development model. Features include skewed-t distribution with time-varying scale parameter, Reversible Jump MCMC for determining the functional form of the consumption path, and a structural break in this path.
  • actuar: Loss distributions modelling, risk theory (including ruin theory), simulation of compound hierarchical models and credibility theory.
  • fitdistrplus: Help to fit of a parametric distribution to non-censored or censored data
  • favir: Formatted Actuarial Vignettes in R. FAViR lowers the learning curve of the R environment. It is a series of peer-reviewed Sweave papers that use a consistent style.
  • mondate: R packackge to keep track of dates in terms of months
  • lifecontingencies - Package to perform actuarial evaluation of life contingencies
Other useful documents:
Help! There is a special interest group for R in insurance:

Shell Game 2.0

It is important to remember that everything works in theory. Communism, in its purest form, is just as effective as Capitalism, in theory. Reality is a much different story. The Russians and Chinese are now embracing many of the benefits of Capitalism just as the U.S. faced-down the Robber Barons and put an end to such practices as child labor a hundred years ago.

Reality has a habit of rudely poking holes in theories. My favorite piece of Swiss cheese is the Patient Protection and Affordable Care Act (PPACA). Today we are going to revisit the Preexisting Condition Insurance Plan, the stop-gap measure to provide access to coverage to the long-time uninsured.

Eighteen months and millions of dollars later, it might be difficult to recall that the main justification for completely remaking our health care system was to provide coverage for the uninsured. Remember the uninsured? They were of real concern two years ago. The PPACA was supposed to cure this problem.

Last June, in a post entitled The Shell Game, I discussed the five billion dollars the federal government had allocated to the Preexisting Condition Insurance Plan. Of more local relevance, $152,000,000 was given to Ohio for the four year interim program. Even though Ohio had about 17,000 chronically uninsured, state officials were thrilled that $152,000,000 would help 5,000 people get insurance. I felt that they were a touch optimistic.

Theory, meet Reality.

How’s the program working? Initial projections from the Office of the Actuary of the Centers for Medicare and Medicaid had as many as 375,000 uninsured Americans rushing the states and jumping at the opportunity to acquire heavily discounted coverage. As of April that crush was only 21,454. Ohio, with almost 1800 enrollees, is one of the most successful programs. Don’t worry. We may not insure that many people, certainly no where near the governments rosy projections, but all of the money will be spent.

Sunday’s Cleveland Plain Dealer detailed the difficulties Ohio and Medical Mutual of Ohio, the state’s contractor, are having difficulty raising prices and limiting access. The biggest problem was that no one was prepared for the shocking reality that really sick people rack up big claims.

Now we’re paying actual claims and those claims have come in much higher – the loss ratio is much higher – then had been projected, said Carrie Haughawout, assistant director for health policy for the Ohio Department of Insurance.


The claims for 1800 people were more than what they thought 5,000 unhealthy people would incur? That is hard to imagine. The simple math in last year’s blog post showed that premiums for a 60 year old male would need to be around $800, with the subsidy, to have a chance of covering the cost of care. The Ohio High Risk Pool is charging between $416 and $458 for a 60 year old non-smoker! That isn’t even close.

The PPACA does not include any meaningful cost containment. There is also no underwriting and no exclusions for preexisting conditions in the PPACA’s planned future which begins in 2014. So, as theory invades reality, one day all of these incredibly unhealthy individuals will be moved into the common risk pool. How will this impact the premiums you or your employer pays for health insurance?

The theory is that the unhealthy will disappear in the sea of doctor avoiding, health obsessed, average Americans who will hardly notice the difference of adding a couple hundred thousand chronically ill individuals into the mix. And besides, now they will be paying premium instead of just invading the E/R and counting on the kindness of strangers to pay their bills. Yeah, right.

The High Risk Pools, the Preexisting Condition Insurance Plan, was a dry run for the future of the PPACA. No real planning. Not nearly enough honest, transparent public discussion. An idea that meant well, but was underfunded and was neither properly explained nor promoted. The Preexisting Condition Insurance Plans were projected to do so much at what may have almost seemed like a reasonable amount of money. Instead, we have another program that has fallen tragically short.

Reality, meet Theory.

DAVE

www.bcandb.com

Just a reminder, this post also now appears in the WordPress format on my website. It appears that more people are reading it there and that is where most of the comments are posted.

LondonR, 7 September 2011

On 7 September 2011 I attended the London R user group meeting. It was a very good turn out with about 50 attendees at the Shooting Star, a pub close to Liverpool Street Station. The session started at 18:00 with four presentations, followed by drinks sponsored by Mango Solutions. The slides of the presentation are available on londonr.org.

The first presentation was given by Lisa Wainer from UCL Department of Security and Crime Science about crime data analysis using R. Lisa presented about a project with Merseyside police, where she had built software, in R with the gWidgets package, called the Hot Products Early Warning System, that is used to help understand and characterise the acquisitive crime problem in Merseyside on an ongoing basis, detecting emerging trends in hot products.

Chris Wood gave an insightful talk about his research on sediment biogeochemical modelling in the North Sea. His model uses a set differential equations with over 20 parameters. Chris is able to analyse and fit his model to data he gathered on an expedition in the North Sea using R, the deSolve package and having access to the super-computer at the University of Southampton. How cool is this?

Jean-Robert Avettand-Fenoel talked about the Rook package and how R and Rook has helped him to roll out new applications to his colleagues faster than using Excel, VBA and C++ or RExcel. Rook allows you to build web apps with R. The package is maintained by Jeffery Horner, who also brought us the brew package. The brew allows us, in combination with Rapache, to mix html and R code in the same file. This is quite similar to the approach taken by Sweave for LaTeX and R. However, Rook provides a way to run R web applications on your desktop with the new internal R web server named Rhttpd.

The final presentation was actually given by myself talking about the googleVis package and the recent developments in version 0.2.9:

Including googleVis output in a blogger post

It seems that you cannot include Google Visualisation Charts into a blog post directly.
So, I tried to include the output of a googleVis function as a gadget, but also unsuccessfully.
Although you can include gadgets into your site template, it doesn't seem to work with blog posts. So, here is the trick which works for me: the iframe tag.
The following geo map is included as
<iframe width="100%" height="400px" frameborder="0" src="http://dl.dropbox.com/u/7586336/blogger/AndrewGeoMap.html">
</iframe>

As you can see, the chart itself is actually displayed in a page hosted by Dropbox and only inserted into this post via the iframe-tag.

For those of you, who would like to replicate the plot of Hurricane Andrew, here is the R code:
library(googleVis)
AndrewGeoMap <- gvisGeoMap(Andrew, locationvar='LatLong', numvar='Speed_kt',
hovervar='Category',
options=list(width=600,height=300,
region='US', dataMode='Markers'))
plot(AndrewGeoMap)
print(AndrewGeoMap, file="~/Dropbox/Public/AndrewGeoMap.html")
Created by Pretty R at inside-R.org

Correction (18 October 2011)

I just figured out that we can actually embed a chart into a blogger post directly. You can literately copy and paste the code directly into the post. However, it doesn't seem to be displayed with MS Internet Explorer.

Anyhow, here is the example from above again:

print(AndrewGeoMap, "chart", file="~/Desktop/AndrewGeoMap.js")

Now I copied and pasted the content of that file below:





The Future Is Fine. I'm Concerned About The Present.

I'm sitting outside of Club Isabella waiting for a friend. There are six medical students at a near by table enjoying food, friendship, and a moment away from their daily stress. What do they talk about? They joke and laugh about doctors and classes, routines and procedures, and their daily grind. They are an interesting group. Two are women. Four appear to be of Asian descent. One, a tall thin white guy with his baseball cap on backwards, appears to have been delivered to us from Central Casting. They wave and shout to their friends walking by. They are incredibly normal.

I find this terribly reassuring. At 56, I am looking at the people who will be caring for me 20 years from now. They are bright, engaged, and sound like they are actually enjoying their work. This is important. If all of this work, time, and effort is just to get a title, a job, and a paycheck, they will never be fulfilled. And they probably won't be very good at the practice of medicine. One can only hope that their discussions of cadavers (over dinner!) is a precursor of great careers.

This concerns me, the general happiness of physicians, because so much is changing in the practice of medicine. Many previously independent doctors are now, in 2011, employees of the major hospitals. Some have adjusted to this change. Some doctors embraced this. Many, however, have not. Being an employee, even a highly compensated one, is not the same as being your own boss. There is a certain freedom in being an independent business owner. And other doctors, like radiologists, have seen specialty treated like a commodity.

I'm not ready to have my health dependent upon the lowest bidder.

Our young doc-to-be's at the next table have not experienced any of this. There is no transition for them. Medicine will be a corporate enterprise for them, complete with signing bonuses and holiday pay.

How will this impact the way they practice medicine? For one, they will have been initiated, from day one, into a system that allocates a specific number of minutes per patient. They will be instructed in profitability. They will always know the origins of their income. And once you are in this system, how hard is it to change employers? If, or when, the government becomes the major or single payer of health care, would these doctors even notice?

Hard to say.

We face a looming shortage of primary care physicians and gerontologists. I didn't ask any of the future docs what they wanted to practice. I only wonder if their future employers will bother to ask.

DAVE

www.bcandb.com

googleVis 0.2.9

Today we published googleVis 0.2.9 on CRAN. The new version updates the package for the new features of the Google Visualisation API and brings a new in-page editor option.

Here is a simple example, displaying the participants of the R user Conference 2011 in Warwick by country. Notice the 'Edit me' button in the top left corner of the chart, which allows you to change and customise the graph.

library(XML)
url <- "http://www.warwick.ac.uk/statsdept/useR-2011/participant-list.html"
participants <- readHTMLTable(readLines(url), which=1, stringsAsFactors=F)
names(participants) <- c("Name", "Country", "Organisation")
## Correct typo and shortcut
participants$Country <- gsub("Kngdom","Kingdom",participants$Country)
participants$Country <- gsub("USA","United States",participants$Country)
participants$Country <- factor(participants$Country)
partCountry <- as.data.frame(xtabs( ~ Country, data=participants))
library(googleVis)
## Please note the option gvis.editor requires googleVis version >= 0.2.9
G <- gvisGeoChart(partCountry,"Country", "Freq", options=list(gvis.editor="Edit me") )
plot(G)
Created by Pretty R at inside-R.org