Predicting the growth of PLoS ONE
Abstract: This first attempt calculates the annual growth of PLoS ONE and applies a seasonal trend analysis on these numbers. Between 2007-2011, 28,898 contributions were published in PLoS ONE, resulting in an annual growth rate of 62.17% for this period. Holt-Winters filtering for seasonal trend analysis predicts 18,284 published PLoS ONE contributions for 2012, and 31,978 for 2013 (compared to 13,797 in 2011). The findings raise the question about the duration of exponential growth of PLoS One publication volume, the transition of scholarly publication models, and, furthermore, the future of institutional Open Access publication funds.
In recent posts , Martin Fenner presents approaches to visualise the performance of contributions published in Public Library of Science (PLoS) journals. Two APIs provided by PLoS were taken as the data source for these exploratory visualisations; one searches the PLoS domain for particular contributions, the other obtains Article Level Metrics (ALM) for each PLoS contribution. The latter is feeding into the ongoing work on Altmetrics (see eg Priem et al arXiv:1203.4745).
Collected in the work-in-progress plosOpenR GitHub repository, a joint collaboration of members from PLoS Article Level Metrics project, Bielefeld University Library and OpenAIRE was initiated.Our incentives for this work is to further enhance the existing R package rplos provided by rOpenSci. It shall allow crosswalks based on common funding information between data coming from the PLoS Journal server, including its collected metrics, and data on documents stored in institutional repositories. In a first step, this will be worked out as part of the FP7 funded OpenAIRE project which sets out to build an Open Access Infrastructure for European research.
In this post, I propose how to a) detect the annual growth rate of PLoS ONE contributions and b) try to forecast the further growth by applying Holt-Winters smoothing which is a time series analysis method to detect seasonal trends originating from economics .
Applying time series analysis on PLoS One is particularly interesting for at least two reasons: Firstly, PLoS ONE publishes each contribution right after acceptance. This forms a publishing model that differs mostly from print journals where accepted submissions are commonly published in issues. Secondly, its multi-disciplinary coverage distinguishes PLoS ONE from most other academic journals.
To act in accordance with the PLoS Search API Terms of Conditions, thereby avoiding API overload, I downloaded the latest dump of 47,430 PLoS contributions from April (available here). After table cleaning in Open Office, the resulting csv file is loaded into the R working space. In the following, I summarized the data by Journal name and publication date.
require(plyr) my.plos <- read.csv("plosalm.csv",header=T,sep=",") tt <- ddply(my.plos,.(Publication.Date,Journal), nrow) # format may differ according to pre-processing routines in OO date <- strptime(tt$Publication.Date,format="%d.%m.%Y") year <- date$year + 1900 my.data <- cbind(tt,date, year)
After this step, a summary of publications frequencies by each year and by PLoS journal can be obtained and exported as html table (see results, Table 1).
#table my.tab <- as.data.frame(tapply(my.data$V1, my.data[,c("Journal","year")],sum)) sum.journal <- rowSums(my.tab, na.rm=T) my.tab <- cbind(my.tab,sum.journal) sum.year <- colSums(my.tab, na.rm=T) my.tab <- rbind(my.tab,sum.year) #export as html table require("xtable") my.tab.x <- xtable(my.tab) digits(my.tab.x) <- 0 print(my.tab.x, type="html", file="summaryPLoS.html")
With regard to the so gathered annual number of contributions, the Compound Annual Growth rate (CAGR) can be obtained. CAGR is used in economics to measure a year-over-year growth of an investment. In our case, we calculate CAGR for the 5 years period from 2007-2011 to describe the growth of PLoS ONE contributions.
In order to predict the future growth of PLoS ONE contributions, the Holt-Winters was applied on the obtained data as this method is sensitive to seasonal trends (see results, Figure 1). In another blog post it is described how to apply Holt-Winters in R. In a first step, the subset the table for PLoS ONE was built. Afterwards, I calculate the number of monthly contributions. The zoo package provides the tools for achieving this task.
require(zoo) #plos one my.plos <- subset(my.data, Journal == "PLoS ONE") #as zoo object to monthly summary z <- zoo(my.plos$V1, my.plos$date) t.z <- aggregate(z, as.yearmon, sum) #time series object ts.q <- ts (t.z, start=c(2006,12), frequency = 12)
A time series object is created for the period beginning Dec 2006, where the first PLoS ONE contributions were published, until the end of March 2012. This forms the basis for calculating both the Holt-Winters distribution and the forecast of PLoS ONE growth until end of Dec 2013 with a confidence level of 0.95.
#Holt-Winter Distribution ts.holt <- HoltWinters(ts.q) forecast <- predict(ts.holt, n.ahead = 21, prediction.interval = T, level = 0.95) plot(ts.holt,forecast, frame.plot=F, xlim=c(2007,2014), ylim=c(0,4500), main="Holt-Winters filtering PLoS ONE contributions")
The PLoS contributions by journal and year show a moderate growth in most journals but a strong growth in PLoS One (see Table 1).
|PLoS Clinical Trials||40||28||68|
|PLoS Computational Biology||72||168||251||287||376||414||418||121||2107|
|PLoS Neglected Tropical Diseases||42||179||224||350||445||126||1366|
On the basis of these data, a Compound Annual Growth Rate for PLoS ONE can be calculated for the 5 years period from 2007 to 2011. As a result, PLoS ONE’s annual growth rate is calculated as being 62.17 %.
Applying the Holt-Winters method, a plot can be generated, which gives first insights into the distribution (see Figure 1). The black lines highlight the observed contributions per month until the end of March 2012. The red line presents the fitted Holt-Winters values, starting in Dec 2007 until end of 2013. The blue lines highlight the upper and lower confidence intervals. The vertical line borders show observed and predicted values.
The exponential smoothing predicts the monthly observations well. However, note the sharp decline between December 2011 and January 2012. Predicting values for 2012 and 2013, following this approach, PLoS ONE will publish 18,284 contributions in 2012 (confidence interval between 15420 – 21149) and 31,978 (confidence interval between 22679 – 41279) contributions are predicted for 2013.
If my attempt is sound, and I really do appreciate any critical comments, then the obtained growth rates will have consequences on the publishing landscape. No where else, such extreme growth rates of the general scientific literature were never reported before . It also raises the question about the share of articles that do not receive any single citation. Known as the scientometric phenomena of “uncitedness”, this may be tackled by future analysis of PLoS ALM data. On a side note, the analysis reveals the seasonal decline between December 2011 and January 2012. This might, prima facie, resemble biases in the submission and selection processes in other journals . This would also require further examination.
In conclusion, if these growths rates can exclusively be reported for PLoS ONE, the implications for the publishing landscape could be strong. Swift, cross-disciplinary publishing platforms could pressure the market leadership of the high impact subject-specific journals. Since PLoS ONE requires author publication fees for most of its contributions, institutional services and likewise funders covering these fees may have to consider whether this growth affects their funding activities to cover author publication fees. At least as part of our local Open Access Publication Funds of Bielefeld University activities we’ve been experiencing the growing importance in the last years, too.
The intial R source code can be found at plosOpenR GitHub repository: https://github.com/articlemetrics/plosOpenR.
I wish to acknowledge helpful comments and suggestions by Wolfram Horstmann.
 C. C. Holt (1957) Forecasting trends and seasonals by exponentially weighted moving averages, ONR Research Memorandum, Carnegie Institute of Technology 52. P. R. Winters (1960) Forecasting sales by exponentially weighted moving averages, Management Science 6, 324–342. Useful Introduction: P. Goodwin (2010) The Holt-Winters Approach to Exponential Smoothing: 50 Years Old and Going Strong. Forecast Spring 2010.
 L. Bormann & H.D. Daniel (2010) Seasonal bias in editorial decisions? A study using data from chemistry, Learned Publishing, 24, 325-328.