The Scientific Review Process Revisited

The scientific review process is probably not well-known to most. I’d like this post to be readable both for scientists and non-scientists, so I’ll briefly sum up the work of a scientist and the scientific review process, at least as they happen in computer science. As I’ve been working with a bunch of people in related sciences (bioinformatics), I’ve also gotten some insights from another world I’d like to share (especially with computer scientists). This also has relevance for open-access proponents.

Much of this is inspired by something an old friend ((Mortal enemy.)) said some 5-7 years ago. He mentioned that he always signed his reviews by name. That kept him honest in his reviews. I liked the idea, but at the time didn’t have the guts to do so. That friend was Mailund, and he has recently(ish) written a blog post about the topic on his blog. If you see him, tease him about his height and maybe toss him like the dwarf he is 🙂

This is also inspired by having to hand in 8 reviews today, so I’ve been thinking about reviews the last week or so.

The Work of a Scientist

A scientist is, paraphrased from the immortal words of Paul Erdös, a device for turning coffee into theorems. Our job is to produce science. Science can be reusable experiences about computer software, it can be mathematical theorems, it can be big humming machines, it can be a cure for a disease, or anything else. Some will claim that social sciences, political sciences, history and other should be on the list, but let’s save that discussion for another day. ((They most definitely should not; we’re dealing with real science only here!))

Science has to be shared, so others can build on it. We do this predominantly by writing papers about it and publishing them. We publish our results in various venues. From least to most respected, the traditional venues are technical reports, posters, workshops, conferences, and journals.

Technical reports are basically just ourselves writing a paper and putting it on our homepage. There are variants, such as having your department or a small group of people collect them and indiscriminately post them online.

Workshop and conferences are similar and some venues considered as conferences are called workshops (though the reverse is rarely seen in serious venues). The common factor is that people typically show up to these things, present their papers and subsequently talk about them. The distinction between workshop and conference is in how formal it is, i.e., how close to finished work should be before it is published, with workshops leaning towards “I got this great idea and worked on it for a couple weeks; what do you guys think?” whereas conferences lean towards “This is a problem, these are the challenges, this is the solution, this is why my solution is better than any previous solution solving a similar problem” side of things.

Both conferences and workshops often have poster sessions associated. Here, often young, researchers present their budding ideas and hope to get comments from other, typically older, researchers. For young PhD students this is a way to get to know some of the people in their field and for others it is a way to spur interest about a new idea, method, or tool.

Journals are basically periodicals sent to researchers. Here the best of the best results are collected. The time from start of work to submission is long. The time from submission to acceptance is long. The time from acceptance to publication is long. Journals just take a long time. They publish better results, though.

The process is often that you get an idea, write one or two workshop papers about it to get input. Then you condense that or those into a conference paper to share with a broader audience and to spur more discussion. You elaborate on the idea for a year or two (or more), write a couple more conference papers. When you feel that an idea has been researched and solved, you write a journal paper, summing up the most important references. By now, you know the problems so well and has presented it so many times, you can write a very easy-to-read description of the problem and the solution.

That is, that is how it works in computer science. Other sciences have their own traditions. For example, in mathematics scientists tend to primarily post in journals is it is hard to provide incremental results to a proof; either you have proved a theorem, or you have proved a lemma (simple theorem) that you think may be useful for proving the main theorem.

The Reviewing Process

In ancient times, people couldn’t travel around the world, so publication was done mostly by letter. Later, this was put in system, so you could send your manuscript to a journal, who would then send it out to many scientists. Traveling was expensive, so you could not go to many conferences or workshops. This put a natural limit on the number of results or papers could be published at a venue. Today, traveling is cheap and anybody can self-publish using their printer or – cheaper – using their homepage. There is still a limit, though, namely the time of researchers. Today, no researcher has the time to go thru every crackpot paper out there to seek out the good ones. Even in a small field, 100s of papers are published every month, and that is just the best ones.

To limit the number of papers without neglecting good ones, reviews are used. Papers are sent to a few (1-5, most often 2-4) reviewers. They read thru the paper, assess it according to whether it is interesting and new, whether it is easy to read, whether it relates to work on similar problems, and sometimes a couple criteria more. Based on these reports, an editor can make a decision about which papers to include in their journal/conference/workshop (posers and technical reports are rarely reviewed).

As some people are bad at accepting critique, reviewers are often anonymous. This makes is harder for authors to take “revenge” by returning a bad review when they have to review a paper from a person giving them a bad review. Often this is a moot point, though, as you can often read thru a review and have a very good idea of who wrote it; this becomes possible after a few years in the business, and only takes off when you participate in the reviewing process as you see people’s reviews with their names.

In the same way, some venues use a double-blind reviewing process; that means you also cannot see who the author of a paper is. This is to avoid reciprocating bad reviews but more importantly so that even the super-star of a field gets judged by the quality of the paper and not by their name. Again, this is often completely moot as you can see the author name by browsing thru the references: the person with the most cited paper is most likely the author (or the supervisor of the author).

The reviewing process is not there to trash bad papers, it is there to improve papers, both good and bad. In science, the mantra has for a decade or two been “publish or Paris,” meaning you have to publish or Paris Hilton comes after you. Nah, that’s not actually true. The real saying is “publish or perish,” meaning if you do not publish papers, you get fired. Researchers spend time writing papers to avoid that, but writing a paper takes time; you need to do the research, write and edit your paper several times before you send it for review. There are anecdotical exceptions ((My most sought-after paper I wrote in less than a day, and I co-wrote another paper in 2 days which went on to get an invitation for submission to a journal.)) but the general rule is that a lot of work has to be put into a paper. This means that if a paper is not accepted for publication at one venue, the authors will edit it some more, hopefully taking review comments into account, and submit it elsewhere. If a paper is accepted, it will be edited again to take reviewer comments into account before publication. For more prestigious venues, like high-class conferences and journals, papers have to be accepted by the reviewers after revision and sometimes go thru another (or more) review/editing rounds.

All in all, this means that all ((Most.)) papers gets published at some time, so it is in the best interest of reviewers to make the authors improve the paper as much as possible and as fast as possible. This is done by not only giving grades according to the criteria, but also providing constructive criticism of the content and form. This includes pointing out spelling and grammatical mistakes (for good papers; bad papers have other more important issues to address first). More importantly, this also includes pointing out mistakes, referring to related work overlooked by the authors ((That’s the theory; practise is namedrop 3 of your papers to get citations of your own work.)), suggesting experiments supporting the claims of the authors, and suggesting extra work that could be interesting ((Or keeping this to yourself so you can publish it and get credited.)).

Faults of the Reviewing Process

The most pressing problem of the reviewing process is also the most obvious. I have reviewed several papers describing software tools. We even have guidelines telling authors that they should specify where we can get the tool, yet half of the papers I had forgot that. That means a lot of the reviews contains comments along the lines of “the paper forgets to list a download URL.” When I see such a comment, I know the reviewer has not downloaded and experimented with the tool (or at least looked at the homepage and examples of use). This means the review is not as thorough as it could be; without access to the tool, the reviewer cannot validate that the results are correct and try out their own examples. This could easily be fixed with an e-mail to on of the authors asking for a download. The comment in the review is mostly useless – it takes 2 minutes to add to the paper – but the consequence if not having it for the reviewer are severe. The reason the reviewer cannot do this is that they are supposed to be anonymous. By mailing the authors, they reveal their identity. It may also be that the reviewer cannot obtain a cited paper (seen that as well) or that the reviewer did not understand a certain central point. This could also in theory be fixed by an e-mail but cannot be due to the (double-)blind nature of reviews.

The second problem is that scientists are notoriously bad at managing their time. People get surprised if they ask me if I can do something in a moth or two, and I respond that no, I am busy then. I do pretty strict time management, so I know if I am busy at any time. That way I avoid taking too much work, avoid getting behind deadlines and avoid have to do everything in the last minute. As mentioned, most scientists do not do that, so they take on too many reviews or let themselves be blinded by the fact that they get to be in the programme committee of Glorious Conference 2000, and say yes to reviewing 8 papers they don’t want to and don’t have the time for. No matter the reason, somebody write shoddy reviews. At work, we jokingly call them the “weak accept, high confidence one-liners”. They are reviews where all grades are just above average (but not at the top), the reviewers typically view themselves as experts, but the last part reveals them: they did not actually read the paper in detail and has nothing to say about the paper, so they just restate the title and end with “I sort of liked it” ((Though restated to sound more scientific.)). They do not want to kill the paper (so they do not give negative grades), but neither do they want to fight for the paper (they do not want to give top grades to a paper everybody else gives negative grades) as either would require reading the paper in detail to defend your viewpoints.

Finally, some reviewers do not argue for some opinion. They simply state a matter of taste as a fact without any argument as to why they say so. These get mixed in with valid points they just forgot to write a reason for. I for example like tables formatted in a (non-standard) way (I did that during this round of reviews) and always suggest so (and most people I have suggested that to have taken it up and now suggest others to do so as well). That is a matter of taste. I always state that this is just my opinion but I think it makes it look nicer and more readable. People can then elect whether they agree with my taste or not ((And if they have any sense, reach the conclusion that yes, indeed they do. 🙂 )). If I find that some terminology is unfortunate because it collides with some other terminology (I did that too during this round of reviews). Then it is no longer a matter of taste; my opinion is more important because there is a factual reason the terminology is bad. People are non longer free to choose between my and their taste. They can still ignore my suggestion, but as a reviewer in a second round, I’d hold that against them if the reason is good. This can be strengthened to not liking a theorem because I have a counter-example. It is thus important to argue for your proposals; if the author can follow and agrees with your arguments, they have to agree with your conclusion. Even if they don’t, the editor or other reviews may and end up rejecting the paper if the author doesn’t take a suggestion with a good reason to heart. The problem is that for a reviewer, arguing is much harder than just stating the conclusion of the argument. As you are in a position of power – the authors want a paper accepted and depend on you – you can state your opinion as equally valid as a counter example, and as the author does not know who you are, they cannot argue your point with you (except via slow-as-molasses review/editing rounds). I have had reviews, where I simply disagreed with the reviewer or was sure the reviewer had misunderstood something, but couldn’t do anything about it.

A Solution: Open Reviews

One solution (imperfect and with its own new flaws) is open reviews. If reviewers have to sign off for a review, they have to stand behind it. If I am not anonymous, I can easily mail the authors asking for any clarification or missing information. I can do that without revealing my identity, because it is revealed anyway. Granted, this can also be solved using an anonymous messaging system built into conference management systems, but with open reviews the problem ceases to exist and does not become something we have to solve using technology. Thus, we would get better reviews as minor roadblocks (but roadblocks nonetheless), like a missing URL to a program, could be solved without an time-expensive review/editing round.

The “weak accept, high confidence one-liners” would also disappear; I doubt any serious researcher would like to have their name under a review signalling to all but the most naïve PhD student that “I didn’t even bother reading your paper.” By mild nudging, that would force people to not accept reviewing a paper if they do not have time. That apparent reviewer-deficit would easily be offset by the fact that editors and program committees could start ugly asking for 2-3 reviews instead of 3-5 – without the risk of getting one or two (or more) “I did not read this”-reviews, you only need a few to fairly assess a paper for lower-profile venues, and for higher-profile venues, extra reviews could be asked if the need arose. This would lessen the burden of reviewers, as they would not have to review as many papers, and improve papers as authors were guaranteed 2-3 good reviews, instead of as now where I’ve received just one review “fuck you and the effort you put into writing the paper, I definitely didn’t bother reading it” (one week late, with a deadline of two days over a vacation, with a suggestion to cut down the paper to half – I naturally withdrew the paper faster than you can say “never submit anything to the AgilES workshop” ((Which is of course completely unrelated; it’s just something you can say slower than I withdrew the paper.)) ).

An only mantra is “don’t hate the player, hate the game” and another is “don’t shoot the messenger.” While some authors may blame reviewers for a bad review, I hope that most won’t hate the reviewer, if they actually argue for their statements. If I have a reason for some opinion as an author, I am much more inclined to take it seriously; if I know it’s not just the taste of the reviewer, but there is a good reason for using another terminology, I’m very inclined to do so. Even if somebody honestly says that something is a matter of taste, and they prefer it that way, I’ll consider it and compare to my taste. If the reviewer is not in the position that they are anonymous, they has to provide reasons. Any statement without a reason is a valid reason to think the reviewer has misunderstood my genius, but if the reviewer has to stand by their opinion, they know that anything unfounded falls back on them. They will be more inclined to argue for their suggestions, making it easier for authors to see whether this is a matter of taste or there actually is a good reason for it. It also makes it possible to see if the reviewer has misunderstood anything. If the argument doesn’t hold, you can simply point that out (or the other reviewers may catch it and the author will never see it). When I have to argue for opinions, I find that I leave out some comments, simply because I cannot find hard argumentation supporting the claim (I’ve done that during this round of reviews).

Open reviews are of course not the be-all-end-all to the reviewing process. Sometimes you deal with authors who hold a grudge, and it is very daunting to sign a negative review to such an author. I am not in favour of forcing all reviewers to sign reviews, but by forcing myself to do so, I find I write better reviews. I typically read the papers, make my notes, give initial grades and then spend a couple days thinking about the review, the paper, and the problem before actually typing in the review. This means I only remember the most important points, and I have good arguments for them. I can write those down without looking at the paper. After that, I can look at my detailed notes and see if I’ve forgotten anything important and note down the minor comments. By having the papers in the back of my head for a couple days, I can weigh the different arguments, think about how the scored should be given, and what I would do differently. I did not always do that when I didn’t sign my papers. I’ve typically received compliments for my detailed reviews, but I would typically do them in an afternoon from start of reading to end of writing review.

Beyond Open Reviews

In the young days of the internet (15-20 years ago), to find anything online, you’d typically go to one of the web-indexes. One of the big ones were Yahoo!, where homepage owners went and registered their pages, put them in categories, and an employee would check it and either accept or reject it. This is very similar to the current system of reviewing. You could search in a small selected subset of the entire internet. If it was not on Yahoo! (or not published in any of the big journals), you wouldn’t find it. If your site was not on Yahoo! (or your paper not in a good journal or conference), nobody reads it.

Later came Altavista, which automatically surfed the internet and index all pages. This is similar to self-publishing or to publishing sites like Arxiv. Everybody can publish, which means it takes approximately 3 minutes before the site is swamped by spammers. In the area of science, this means crackpot theories (I once found a paper “proving” the real numbers were countable on Arxiv, funny reading) and wrong papers due to lack of reviewing.

I’ve been doing a bit of work in bioinformatics ((Really, others have done the work, I wrote a bit of code, I got on the paper.)), and got a bit of insight to publishing a science younger than computer science. Believe what you will, but computer scientists are extremely conservative and swear to dead-wood publishing. In bioinformatics, it is customary to send a paper draft to “everybody” working in the field. This would never happen in computer science for fear that somebody might steal the idea and publish it before the real author. In bioinformatics, the rationale is that if you send it to everybody, nobody can publish it and claim they invented it – everybody has seen you sent it around first. This makes it possible to build on work that will not get published for another year or two. This also means you do not have to publish every epsilon improvement so others can build on it. All in all, this means you can publish fewer but better papers. You get “review” comments during writing and not after, plus you can work on an idea until it is very polished before publishing. You basically take the during-the-coffee-break discussion with your colleagues about a problem you are struggling with world-wide.

Bioinformatics is also much more active in scientific blogging. This is basically publishing without reviews, but the reviews pop up afterwards. You get direct comments on what you write. You can see the “review” comments at the bottom of each publication. Granted, the author can remove negative comments, but this is sure to spread. If I make a comment and it get’s deleted, I’ll be quick to write a post mentioning this. If anybody writes a false post about me doing so, I can debunk it, and the web of trust can decide who you want to believe. The same goes for positive feedback, where you can recommend good posts by others, and if you become popular basically get your own “journal”, where people want to be listed to get read. With one big change: everybody can be come editor dirt-cheap. If quality starts declining, people will simply ignore you. It is a much more agile approach to publication. I try to do a bit of scientific blogging (I typicality tag those posts with “work” but also have more specific tags for things related to CPN Tools, things related to ProM, things related to Declare, and discussions about presentations I’ve done). Most of the things are small anecdotes, but there are a couple ideas that could probably be published at less critical workshops (and things I am still writing papers about).

Scientific blogging is open access but better; not only does everybody have a right to access to read everything, everybody has a right to publish as well. Using indices we trust we can avoid the crackpots and the gibberish and only look at the good stuff out there. There’s still room for old-fashioned media in this world, but anything that is not open access will not get links from the big aggregators and hence fewer readers. The prestigious open access journals that charge for publication may have a harder time if people get as many readers on their own hosting, though maybe hosting on an open access journal is cheaper in the long run than a privately hosted blog. The beauty is that we do not have to make an explicit step; we just have to start not looking down on scientific blogging, and the rest will sort itself out.

Scientists like to quantify things, and one of the important quantifiers are number of publications. Scientists are of course quite smart people, so as soon as number of publications became important, they started gambling the system. If number of publications is important, publish a lot of low-quality papers with minimal new content. And thus was born the publon. Measurements got better, and people started counting citations or papers with many citations. This is known as the H (or Hirsch) index, and measures how many publications you have with a given number of citations. If you have 9 papers with each at least 9 citations, your H-index is 9. You can have 20 publications, but if only 5 have 5 or more citations, only they count towards you H-index, and the absolute number of publications becomes less important, and more interesting papers people cite becomes important (hence reviewers prompting authors to cite their own papers). The scientists’ solution to this was self-citing. If you have 10 papers and each cite the 9 others, you have an H-index of 9 without anybody ever hearing about your work. Next is high-impact publications. Now, only the impact factor of the venue of your paper counts, and your H-index and self-citations have less weight. The impact factor is the average number of citations a paper in a journal/conference gets. This is basically a distributed H-index. When this became important, journals started boosting their impact factor artificially, by forcing authors to cite, say, 3 papers from the journal before inclusion. This is harder to circumvent, as citation within a journal in a narrow field is not uncommon, so just disregarding intra-journal citations is not exactly fair.

With scientific blogging it is easy to add a new fair measurement; it would basically be the Google page rank of your blog-post/blog. This is even more distributed and harder to gamble. It is of course not impossible, as SEO experts and link spammers will know. The difference is that in science, the page rank can be supplemented with a web of trust. This basically means that I trust links of my colleagues more than I trust links of strangers. I trust links of colleagues of colleagues more than strangers but less than colleagues, and so on. No doubt, somebody will find a way to gamble this system as well, but it is quite a bit more robust than 4 old people hating their life and especially you deciding whether your brilliant idea gets any attention without ever reading what you did.

Conclusion

I really hope scientific blogging takes off, so focus is less on writing long papers, but can be on doing and disseminating research. As a scary example, look at the previously prestigious but now common Springer Lecture Notes in Computer Science volume numbers throughout the last 20 years:

Springer LNCS Volumes

Year	# Volumes
2013 *)	601
2012	623
2011	615
2010	580
2009	537
2008	527
2007	526
2006	484
2005	471
2000-2004 **)	298
1995-1999 **)	153
1990-1994 **)	90

Number of Springer LNCS volumes published every year.

*) Values for 2013 are values from February 20 divided by 50/365
**) Values before 2004 are averages over a 5-year period

We see that the number of publications go nothing but up (except for the estimate for 2013). That is not productive; it means more time is spent writing, reviewing, and reading papers. It would be more interesting to post a notice on your homepage whenever you found something interesting. You could then write a long description (similar to a current journal paper) when work was done.

I am not saying we should all go to scientific blogging, page rank, and web of trust tomorrow, but I think it is a nice successor to the current system. We would of course have to think about archival for the future, but organisations like archive.org already do a good job of this, and I am sure platforms would emerge providing archived hosting for science – either sponsored by universities or private benefactors willing to support research. Publishing finished results, the “journal papers” of the internet, could go to highly respected places, and we could all stop writing 20 crappy papers a year.

Until then, I’m gonna do my own little revolt by signing my “anonymous” reviews and publishing about what I am working on on my blog.