Programs for organizing PDF reprints
A question recently went out on the Apple scitech list to the effect of "What are good programs for organizing PDF reprints?"
I've played with a wide variety of these sorts of databases over the years, here are some impressions- (well, I don't -do- impressions, I'm a scientist...)
A few things to note -- I do vision research, I use TeX / BibTeX, many new reprints I use are available from PubMed-indexed sources, I've scanned in and OCRd a few thousand of my old reprints (with the help of several brave work study undergrads), my entire database is 3k or so papers and about 10GB. Thus, all my opinions are colored by these facts.
Yep -
Interesting, but not really geared for scientific papers (ie- no way to associate citation information w/ them easily). Not sure it would handle my entire library. Seems a bit more for the 'here's a bunch of paper I want to digitize' crowd, and for that it looks to be excellent. I found the tagging/metadata a little clumsily implemented. Pros- aesthetically nice. Cons- not good if you want citation information.
Papers -
Pretty, honestly not even in alpha, let alone near release ready. Crashed chronically, lack of ability to locate files where I want them (ie- incomplete preference system). One nice thing - love the 'select some text and look it up in PubMed' feature to fill out the citation information. I'd love to see this in other programs (notably BibDesk). Not totally clear how the BibTeX is/will be implemented. Pros-aesthetically nice, PubMed lookup nice. Cons- boom! bang! They're charging for these pre-betas with a timeout on the 'trial' of 30 days. Has potential to be sure... but need to -see it work- before I'm willing to invest too many papers / $.
DEVONthink -
Does a lot more than just database your PDFs. Sort of kitchen sinkish, it handles your papers, other media, web pages, tables, etc. It handles several thousand papers easily. The Office version comes with an interesting web-server to access the database from offsite (the search is wicked-fast), and a really fast OCR (esp compared to the pre-acrobat 8 OCR. 8's OCR is -finally- reasonably fast). Again, there's no -obviously straightforward- way to tie in the BibTeX / citation information. There are ways to do it, they just feel a little clunky. The 'find similar' and 'classify' features are really the most useful here. Have a paper on synaptic plasticity and want to find other papers in your collection that are lexically similar? You can 'see also' them using their engine (which works pretty well, but not top notch with such a large database). It uses something seemingly simple to do the matching, maybe just concordance (nothing like LSA, it doesn't look like - when will Apple expose the hooks to the LSA framework in 10.x ? ).
Pros- cool 'similar paper' finder, web server. Cons- showing its age aesthetically, no easy citation information linkage, have to go through acrobatics to get Spotlight to index PDFs when they're embedded in the database (ie- no good spotlight support).
WorkLife framework-
Mathematica specific, it isn't really a PDF database specifically but much more in that it is an interesting diary / note system. You -can- use it as a sort of database for your PDFs if you use Mathematica . I've been using a version of MMa that WorkLife isn't quite compatible with yet, but played around this summer using 5.2 and saw some excellent potential there if you're a MMa fiend. I'm sure once the versions get synched up I'll be using it again, as I am a MMa fiend.
Pros- integration into a comprehensive tech computing environment. Cons- same :)
BibDesk -
I've used BibDesk for quite a while to keep my BibTeX in ship shape. The features keep getting better, seems to remain solid.
When I migrate older papers in I wish that there was better 'grab the information from PubMed' feature like Papers, and a nifty 'find similar papers' like Think. Still, easy enough to work with, you'll be busy if you have a bunch of papers to add... The scripting feature is also useful - clean up capitalizations, massage citations. Lately I've been making the shift from the Think-world because the citation information combined with Spotlight has proven to be more useful than the 'see also' functionality, overall.
Pros- free and well supported despite this :), features continue to expand, designed for BibTeX (again, useful in my world). Cons- not the prettiest cow at the state fair, but the pros make up for that. Did I mention it's free?
EndNote -
Well, I used it in the mid 90s, back when FrameMaker existed for the Mac (snif). Haven't seen it since....
Summary-
Basically I'd like to mate the innards of BibDesk and DEVONthink and put them in a prettier package like Yep or Papers. Depending on your needs each has something to offer, I'd suggest playing with each to see how it fits into your workflow.



Comments
OCR Software
So Flip,
What software did you (and your trusty undergrads) use to "OCR" your scans? I've tried ReadIris which is great but it really screws up the formatting of text, such as tables. I guess that is the trade-off of OCR, you get searchability but sacrifice formatted tabels/graphs?
Also, we just got a new copy machine in my department with scanning capabilities. Now if I want a PDF of an article or book chapter I just place it in the feeder or on the scanning bed and in a few minutes a PDF is sent to a "Copier PDFs" folder on my personal webserver space. I hope your poor undergrads aren't stuck scanning page-by-page with a standard flat bed scanner, ouch!
I use Papers now and am hoping v 1.0 fixing the chronic crashing.
DEVONthink
DEVONthink also has great Applescript support so you could link to a paper using something like:-
tell application "DEVONthink Pro"
activate
set theRecord to get record at "/Cancer/cdk.pdf"
open window for record theRecord
end tell
Their support is also excellent
One tool is not sufficient
I have yet to find one single tool that satisfies all of my needs for keeping on top of my library of PDF articles.
My current system involves saving all PDF articles to a single folder. Each article file is named like 'Jones_2007_1585.pdf', where Jones is the surname of the first author, 2007 is the publication year, and 1585 is the first page of the article. I'm thinking of revising this scheme such that the corresponding author's surname also appears in the filename (would make quick ad hoc visual scans of the folder more effective). If an article has a PDF with supplementary info (and, sadly, this is a growing trend), I'll save that file with a name identical to the article itself, except the token _SUPP is appended to the end of the filename (before the .pdf extension, of course).
My various tools then link in to this one folder:
I have a DevonThink Pro database of links to these articles. I use DTPRo for lexical analysis only (i.e. to find similar articles). I'm not convinced that DTPro's capabilities in this regard are best-suited for full-length articles (apparently it's best at handling 200-500 word chunks of text). Anyway, it does occasionally pull out useful connections.
I've been using VoodooPad as my tool for taking notes on the articles. I have a single VoodooPad wiki file in which each page of the file is devoted to a single article. I use FlySketch to screen-capture article figures and then embed the into the VP notes. Given the wiki capability of VP, cross-referencing between articles is effortless and essentially automatic.
I use Sente as one of my primary tools for harvesting articles from Pubmed. Once downloaded to my articles folder, I'll create a link between the downloaded PDF and the entry in Sente such that I can subsequently call up the article very quickly while browsing within Sente. I colour-code my Sente entries as follows: red=downloaded and requires note taking; blue=article linked and added to my VP notes repository; purple=downloaded but doesn't require immediate attention; orange=potentially interesting but not yet downloaded. I quite like Sente for doing things like this, but so far it has failed in terms of embedding references and generating bibliographies in manuscripts that I'm writing in Word 2004 (this is based on my experiences with an older version -- perhaps I'll try again now that the program has gone through a number of updates).
I'm now using Punakea to add Spotlight-searchable keyword tags to the files in my articles database. Since I use multiple tools, I like the idea of having meaningful metadata that's independent of a single tool -- and Spotlight is supposed to be improving with Leopard, so this tagging strategy may become more important later on.
My latest love is Tinderbox (http://www.eastgate.com/Tinderbox/). I've bought the program (it's worth the price), and am now determining how exactly it will fit in my system. It may supplant VoodooPad as my primary note-taking tool. I can see a very useful synergy between Tbx and DTPro, since I should be able to export reasonably-sized text chunks of my article notes into a DTPro database. This will probably result in more meaningful hits from the DTPro lexical analysis. Tinderbox is so powerful and flexible -- I wouldn't be surprised if it eventually becomes the cornerstone of my system; I think it may even become a good bibliography generator if I program it correctly.
What is still *SORELY* lacking in the Mac world is a program that allows us to overlay notes directly onto the PDFs. The full version of Acrobat will do this, but even Acrobat has its shortcomings here. I'm looking forward to trying out Acrobat8 however -- perhaps it's gotten better. Preview has some crude capabilities in this respect, but it doesn't even come close to Acrobat in this regard. This is a shame because I prefer viewing my PDFs in Preview; it's lightweight and easily integrated with other OSX programs. I would love to have a separate program, a 'meta-annotation program' if you will, that would work with Preview to create and overlay notes on top of the PDF; the PDF itself would not be modified (currently in Preview, annotations are integrated into the PDF), and the annotations could be toggled on and off at will. These meta-annotations would ideally be Spotlight-searchable as well.
Anyway, that's how I'm currently dealing with my ever-growing mass of scientific papers. Ironically, I'm still doing my bibliographies/reference lists by hand. :)
Hah- no, we've had those
Hah- no, we've had those copy machine / scanner devices (canon and now xerox) in the office for the past several years. I wouldn't torture them that way. For what it's worth to those out there, the Xerox support in the Mac domain is OK. I have it set to ftp the scanned papers to a folder on my office machine, then there is a background script that waits for something to arrive and ships it to Acrobat for OCR.
As for OCR, we use the Acrobat OCR. The best strategy when OCRing PDFs is to do it such that the original 'image' of the page is maintained and the OCR text is actually 'hidden'. This way you get the best of both worlds- tables, graphs, etc are as original but searchable, at the cost of a few meg here or there.
Good point, I forgot to
Good point, I forgot to mention this. This is how I've hacked reference before.
pdf naming
My scheme for naming pdfs is to use the author's surname and the minimum information that lets you completely specify a reference: volume, abbreviated journal name, page. So filenames are like:
Jones_37PRL14436.pdf
Meaning Jones is the first author, in Physical Review Letters volume 37, page 14436. Volume-journal-page means I can save space (although lose readibility) without ambiguity by running them all together without spaces or underscores.
Recently, many journals have shifted from useless filenames (like 'GetArticle') to useful filenames (like ApplPhysLett_37_12321.pdf) for the default names from downloading their archives, and I just add the author's name and don't bother to reformat.
Like iTunes...
What you're describing is exactly like iTunes -- organize the PDF files based on author, journal, etc. I like to do similar things.
If more PDF files included good metadata, this would be easy. But it also seems like the "Mac-like" thing to do for programs like Bookends, Sente, Papers, etc. (I'd like to say EndNote here too, but I won't hold my breath.)
Hmm... it also sounds like something that might work through AppleScript / Automator. I'll take a look sometime.
PDF mark-up and OCR notes
One thing that has baffled me ever since OS X came out is, if PDF is a native graphics format for it, and if the screen display is some version of PDF, then where are all of the native PDF-producing programs? In 1984 when the first Mac came out, we learned that QuickDraw was the native graphics format, and lo and behold, there appeared tons of QuickDraw programs.
Anyway, I have used Create (www.stone.com) to mark up PDFs. There are Save As... options that let you save as PDF with auxiliary Create information embedded. It's not exactly what you ask for but it's close. I now use it almost exclusively to review journal manuscripts. One of the nice features for navigating paper-length files is the window that has thumbnails of maybe 50 pages visible at once.
Just a few days ago, I came upon another fairly effective way to mark up PDFs, but again not in the ideal way. From time to time I spend time reading and annotating patents. After a few years of trying different things, I am now pretty happy with the following, to the point where I will no longer need to keep separate paper versions at all. I download the patent with PatentDownloader, the best-of-class. For a few days, I would request a new feature and the next day it would appear. For example, it does a partial equation-setting of the simpler patent office equations (U.S. anyway. By the way, if anyone knows what math mark-up language the U.S. patent office uses, I would love to know it). It also can trim the margins from the page images (now _there's_ a feature that should be made a standard part of every PDF viewer--I have to do it manually with Preview and not at all most of the rest of the time). PatentDownloader saves as RTFD with the full text in searchable form and the page images as TIFF (I think--maybe PDFs). I then open the RTFD in Pages (!) using a custom template that I've set up that displays the pages the way I want. I can get the page images at 100% height and two-up on my Powerbook. I love having the searchable text in the same file as the images. Now here's the interesting part; you can overlay any graphics that you want on the page images right in Pages. It's a little dicey, but say you want to underline something. Select the page image, then left-arrow. This places the insertion cursor right before the page image that was just selected. Insert whatever you want (in this case, say, a red line two pixels wide) and don't freak out because your page disappeared--it just got temporarily shifted down one notch. Select the Object Placement to be Fixed on Page and to not cause line wrapping. Your red line will appear at the top right of the page image and you can drag it wherever you want. Similarly, underlining longer passages can be done by overlaying a translucent rectangle. This doesn't take as much work as it sounds, plus, once you get one line underlined, you can rapidly make more underlines by option-dragging or command-D. There's more: once you have, say, your underline in place, select it and then Insert Comment. The commenting column appears to the left or right of the main viewing panel, and it is connected to your underline! Move or option-drag the underline, and the comment box moves or is duplicated. Furthermore, your comments are searchable by Spotlight.
About the OCR in DEVONthink Pro Office: I really like it, and sometimes I use DTPO just for the OCR function. But it has one annoying feature--it changes the resolution and bit depth of the page image. It sets the resolution to 150 DPI and the bit depth appears to be maybe 8 bits. This sorta sucks, especially if you have a good clean 1-bit scan to start with. But it does re-orient the image on the page so that it is perfectly aligned with the page edges, which is pretty cool. I've worked with the developer and this isn't likely to change, as far as I can tell. They got the OCR technology from another company and so they don't have complete control over what they can do with it, I gather.
Another OCR note: Several years ago, with my Epson scanner I got an OCR program called ABBYY FinePrint. It may not be supported any more, but it does a pretty respectable job of converting a scanned image, including PDF, into a Word file, recognizing columns and leaving equations and figures as small graphics images which actually get placed in the correct place on the page. You'll probably need to do some manual touch-up, but the one or two papers that I treated this way looked surprisingly faithful to the original image. I really wish there was something like this that would persist as a product.
Jerry
Zotero
Check out Zotero, an open source Firefox extension:
http://www.zotero.org/
It has an incredible ability to suck citation and abstract info off a large number of scientific and scholarly databases, store the pdf and/or a snapshot of the web page, tag and organize items, insert citations into MS Word via a macro, and export refs to Bibtex, RDF, etc, or as formatted bibliographies -- all with a nice, iTunes inspired interface. It's cross-platform, and a server-based sync/sharing function is planned for the future.
organize using iTunes?
I noticed this article while reading through on the countless Mac news sites the other day.
not sure if it's of any use to anyone. It describes how to use iTunes PDF support to create a library of PDFs & organise them.
http://lifehacker.com/software/pdf/geek-to-live--organize-your-pdf-library-with-itunes-240447.php
Sente
I also use Sente after a painful falling out with EndNote and Mac support.
I like this program very much and use it search PubMed, organize my PDFs (it will name them for you as you like, very cool) and as a reference program used with MS Word. It has the ability to download and file PDF files with a single keystroke, but with some limitations. You have to have free access (like with PNAS or any of the BMC journals) or be working on a computer at a University with access rights. When it works it is a thing of beauty. Imagine highlighting a dozen references in a window and with a simple keystroke having the program go out and get all the PDF's, link them to their respective references and file them - nice.
Sente still has some limitations, most in working with others. However, the developers are putting out regular updates and are being responsive. they ahve a free trial download, so check it out if your interested. http://www.thirdstreetsoftware.com
Alfresco - a server-side option
These options described here are all desktop solutions which may be just fine for many situations but I want to highlight what can be done if you want a server-side solution available for a whole department or group.
Within my line of work I have evaluated quite a few different document management solutions and what always struck me was that they often focused on one or a few types of content. Either Office documents or images or video. The second thing was that they almost never could be run on Mac OS X Server, the best one could hope for was official support for Macs as client. One of the few who have that is EMC Documentum for instance.
The thing about these server-side so called enterprise content management systems is the idea about one consolidated (logical not physical) repository for all types of content. The repository then have services for handling descriptive metadata, security, versioning, lifecycles as well as content transformation (like doing PDF:s for certain steps in a process or creating low resolution images of high-res TIFFs). Also the repository support real workflows which be used to capture any existing processes (like a formal review of a paper) within an organization
A few months ago I found out that the guy who founded Documentum in 90-ies have taken a few of his collegues with him and started up a new professional open source company called Alfresco. They have just released version 2 of their open source ECM platform and to my surprise Mac OS X Server is a supported platform with installation instructions and the whole thing runs on Java and MySQL (or Oracle if you like).
Now, I also do a lot of work for an NGO here in Sweden called RFSL and we run Macs with Mac OS X Server. We have been testing it for a few weeks now and it looks really interesting and I just wanted to highlight this new option for all of us running Macs in our organization. To me their approach looks just right and their roadmap looks very promising. Would be interesting to have some more experiences from other people who have tried Alfresco in a Mac environment.
http://www.alfresco.com/
Screenshots from GUI
http://www.alfresco.com/products/ecm/screenshots/
Download link from SourceForge
http://sourceforge.net/project/showfiles.php?group_id=143373&package_id=157460&release_id=488010
Regards,
Alexandra Larsson
RFSL, http://www.rfsl.se
Stockholm, Sweden
Alfresco
I'd be interested in hearing about your experience with Alfresco, perhaps you could write a review for us?
Alfresco review
I would love to but I think I need some time first so I can thoughly configure and test the platform. Do you mean on this site BTW?
iPapers
I have been using iPapers (http://homepage.mac.com/toshihiro_aoyama/iPapers/) and highly recommend it!
As the name implies, it is an iTunes for papers. It heavily depends on the PubMed database, therefore as far as you are working in biomedical research, it is very useful to organize PDF reprints. For example, I have 700 papers in the iPaper database.
Pros: free, simple, stable, and pubmed-friendly.
Cons: There is only to organize PDF files, and not for reference managing when you write papers. Not so many functions.
JabRef -> write Metadata into PDF
BibDesk is a really great software for maintaining a BibTeX database.
Unfortunately, if you have to work on a Windows PC, it is not an option. :-(
That's, why I'm using JabRef (free, open source, jabref.sourceforge.net) for platform compatibility.
The user interface is not as mac-like as BibDesk, however it has an interesting feature: it can write the metadata of the BibTeX entry (all biliographical information) in the corresponding pdf file in XMP format! So you can add the bibliographical information in a structured way to the PDF files and if you send a pdf file with that XMP metadata to another user, he can import the metadata directly into JabRef.
Very nice feature!
Besides, I did not find my perfect system yet. I'll stay with JabRef surely for organizing references while writing my dissertation, but for notes about papers and searching contents I'm trying out different solutions: DevonThink seems very interesting for the "see also" feature and its artificial intelligence. Full text search with Spotlight (or similar technologies on Windows XP) is good for finding a certain paper and unique filenames for the pdfs are a must.
.. and - I nearly forgot to mention - Zotero rocks!! Especially for information found online on the web but also more and more with each update for offline information on your computer.
XMP and Spotlight
XMP is a great system for adding metadata to PDF files. Some journals are adding metadata to their files already.
Unfortunately, at least under Tiger, Spotlight won't index XMP data. There also doesn't seem to be support for XMP in the PDFKit library Apple provides. (Strange, considering Adobe provides a SDK library for XMP.)
So you can add XMP data using tools like JabRef, but can't currently search for it. Hopefully that will change in Leopard.
Zotero does that too
Zotero is a i-tunes like interface for managing PDF's and their citation information. Now you can also drag and drop PDF's and other files right into the pane. It gets more intuitive all the time.
bookends and skim
i'm surprised no one has mentioned bookends yet.
bookends is much like endnote but much more reliable and speedier. I now have almost a thousand articles linked with pdfs in bookends. I can color code them (give the colors any meaning I want). And export the citations beautifully when I'm ready for my final draft. Bookends also allows you to make spotlight like groupings of articles out of search criteria. You can also search pubmed and download articles from them as well.
Bookends works seamlessly with the word processor Mellel. While I like it, I cannot fully adjust to it, but it does also have plugins to word as well.
The second part of my system is using the free pdf program Skim. This program allows me to make notes and underline/highlight my pdfs. I do this and then link them to the citation in bookends where I write a more thorough synopsis of the article.
PDF paper organization
Consider QUOSA, a commercial app that works very well with .pdf format. It is a commercial app (see http://quosa.com ) that seems to fit well for our researchers at the National Cancer Institute. We have a great demand from our staff for searching and pulling reprints electronically rather than physically. The product also works well to tap PubMed and other library systems for searching and EndNote for bibliography generation.
If the printer is not
If the printer is not printing properly it could possibly mean that part of the ink cartridge is clogged. If remove the cartridge and replace it it works properly.
lang calendars
Does anyone have experience with Mendeley?
Currently, I am using Papers, but it is a frustrating experience. It crashes way too often, I have to rebuild the database and spotlight index every other day (otherwise search will not find papers that are there), it still does not support all publication types supported by BibTeX (e.g. inproceedings), its BibTeX output contains tons of errors (like publication types that don't exist), and it has a number of annoying user interface quirks. I really like its integration with ISI web of knowledge, but still consider it quite a cheek that mekentosj is charging money for software that is in its alpha stage at best.
BibTeX support is paramount for me, so I am considering to switch back to good old BibDesk (which seems to be getting better every day, and seems to be able to search ISI knowledge nowadays).
Still, I decided to do some research on which other software for organizing papers exists, and stumbled across Mendeley (http://www.mendeley.com/) recently. Did anyone in here try it yet? Is it stable? Does it have *working* BibTeX support?