Building Core Data Scientific Apps?
By grandinetti at Sat, Aug 4 2007 1:08pm |
Is there anyone out there using Core Data to design scientific applications in OS X? If so, I'd like to hear what kind of apps you're building and what your data model schemas looks like. Seems like Core Data would be a great way for the scientific community to build software collaboratively. I imagine that many data model schemas that could be easily shared across various applications.



Re: Building Core Data Scientific Apps?
I've been using Core Data for a lot of things lately. Not directly scientific applications, but often for tools I use to manage application repositories or in applications where I quickly want to create a tool that needs to manage information for a scientific project.
In many cases the schemas are pretty straightforward. I know there are other applications that have been developed that use Core Data extensively (I believe Papers does so, Alex (Mek) can probably comment on that).
I'm finding Core Data to be an excellent tool as it requires very little code to make useful tools for managing information. Which means less time working on the tools and more time focusing on actual work.
What type of applications did you have in mind?
Dave
There are many applications
There are many applications that come to mind, particularly in my field - chemistry, spectroscopy, NMR, MRI, ...
This weekend I took a stab at putting together a Core Data app for the Periodic Table. I thought it would be a nice simple model to tackle. I had three entities (1) Periodic Table, (2) Element, and (3) Isotope. My idea was to develop a model for the Periodic Table that I could use in other applications - spectra simulations and such. Before I start any simulation I usually have to define the sample in terms of elements (more specifically, isotopes) in the Periodic Table. If I had a well thought out Periodic Table Model then I could use it in all my other chemistry-related apps. Also, I'm guessing it could be helpful to other scientist (chemist types).
If anyone is interested in what one can build in about a day and a half with Core Data you can download my Xcode Project at
http://www.grandinetti.org/Software/PeriodicTable
What about computational chemistry? Performance issues?
Hi,
I am currently writing some calculation modules for computational chemistry. I was hesitating to use CoreData, but did not in the end, because I was not sure whether there would be an impact on performance.
The idea here would be to store molecular data, in particular coordinates, atom information and eventually some properties. A few of these structures (like the coordinates) would need to be accessed from the calculation module. The calculation module itself is written in C for performance issues. Could / should I still use CoreData for accessing the coordinate data?
Thanks for any hints.
Marc
NB: another - but in my case secondary issue - is of course portability. If the tool you write is generic, it would be nice if it could be ported to eg Linux without too much effort. If I'm not mistaken, CoreData would be a no go in this case, or are there replacement GNUStep classes in the making?
Yep, Papers uses Core Data.
Yep, Papers uses Core Data. In the beginning it was not always so easy to grasp the whole idea behind it, but once you get over that and start to discover the links and pitfalls it works like a charm.
I think to remember that Georg Tuparev also uses Core Data for some pretty huge datasets related to astronomy, so definitely feasible. Still keep in mind that the question really is if it makes sense to have your data in a database compared to other storage options.
Yes and no. 90% of time is spent in ...
Yes and no. We all know 90% of time is often spent in 10% of code. You don't use Core Data for that performance critical 10%. On the other hand, there is a lot of code we write for setting up the problem and saving the results. That's where Core Data comes in. Regarding portability, a complete Core Data program will not be portable, however, that's not the point. What's important is that the Data Model Schema used in Core Data program be well designed by the experts in each scientific field. Remember the rule: Design to the Interface, not the implementation. If the scientific community could work together to design Data Model schemas with common (agreed upon) interfaces we could pool together and save ourselves a lot of work. Core Data is just one developer environment for designing and implement these Data Model Schemas. There are other environments for other platforms. The Data Model schema (and interface) will be "portable", and should be shared freely.
UML is another way to specify the Data Model. While the Core Data Model graphs look similar, they are not as detailed in describing the model. Ultimately, I think UML is probably needed to properly share the Data Model. On the other hand, I'm impressed with all the code that Core Data generates for free given only the Data Model graph. Are there any other developer environments like that?
Good Examples or Tutorials
Do you have an example core data program or tutorial you recommend? I'd like to see one that also includes report writing from the data collected. How do you search the data you've collected effectively?
--Bob
Core Data Tutorial Links
Apple has created the Corerecipes example, but quite frankly it's too complex for a good beginners tutorial (it does show many aspects of creating an iTunes like app).
By now there quite a some examples that should give you a better beginner's overview, just have a go in google
The ones from CocoaDevCentral are great:
http://cocoadevcentral.com/articles/000085.php
Then there's the NSPersistentDocument core data tutorial.
And check the ADC movie by Jonathan 'Wolf' Rentzsch, it's really good.
Also make sure to have a look at Malcolm Crawford's bindings examples, because especially when combined with bindings CoreData rocks:
Hope that helps,
Alex
GridStuffer/GridEZ
Hi there,
I am using CoreData in my Xgrid scheduler application GridStuffer. Actually, the whole CoreData code is in the framework GridEZ that takes care of creating the CoreData stack and provides transparent APIs to access the different pieces of Xgrid.
While I am on it, I should also mention that Remote Activity, from Drew McCormack, is also using CoreData.
And all of this code (GridStuffer, GridEZ, Remote Activity) is open source.
charles
Data acquisition and analysis
We're using CoreData heavily in the physiology lab in which I work. I've written a data acquisition program that writes data acquired from D/A converters, video, or digital sources to a CoreData database. We have a matching database query tool that allows us to query data from the entire lab's aggregated data, from CoreData files stored on a server. It's proven to be an immensely powerful tool for writing complex query and visuazliation tools quickly. With Leopard, we've noticed a significant performance boost too (it's nice to get these boosts for free when Apple updates CoreData). Furthermore, with the schema migration tools in Leopard, we're able to more freely change the database schema as needs change while maintaining backwards compatibility with old data (this was possible in Tiger, but a lot more work).
WRT to sharing data using CoreData, the sensitivity to the schema details and version would probably make native CoreData files a bit difficult to use across many applications or users. There are many domain-agnostic data formats such as HDF5/netCDF or even some XML schema that are pretty easy to map to entities in a CoreData schema. I would imagine that settling on one of these common data formats would be easier in the long run.
Core Data Tutorial
I came across this recently
http://www.macgeekery.com/gspot/2005-40/core_data_as_a_cheap_database
physiological data + CoreData
barrywark (or anyone else):
Would you mind providing some details on how you implemented your acquisition program with coredata? I work in a neurophysiology lab that collects similar data, and I am working on organizing a database for our data. It sounds like you have a similar solution already, so if you could fill me in on how you designed the app, I would appreciate it!
Also, has anyone else done anything like this, in coredata or in another database? I am also toying around with implementing our data storage in another database, such as mysql or postresql, or even sqlite. Does anyone have experience with a similar situation? In more detail, I record neurological data from human subjects in a series of sessions; so far I have about 25 GB of raw EEG data, not including notes, video, etc. It has gotten to the point where it is difficult to keep it all organized, and a database seems to make sense. Do people store the raw data in a database (eg, individual samples of EEG, which would easily be over 4 billion entries at this point), you just store a pointer to all of the files on disk, a la iTunes?
Thanks!
RE: physiological data + CoreData
It goes all sorts of ways. I've seen such waveform type data (raw data) stored in huge Oracle database servers and I've also seen it stored in flat files. I am partial to the simple flat file approach for raw data if it make sense, but if you need a collaborative app then a server approach is useful, especially if you can take advantage of SQL commands and relational operators. If you have many patients (or subjects) along with many attributes and want to have a nifty app then a good SQL server is a nice idea. I'm partial to MySQL because I just spent a lot of time compiling it universal binary and 64 bits with custom options and it was pretty fun.
Database schema change in Tiger
Hi!
I got interested in the parenthesis in barrywark's post and was just wondering if anyone could give me a pointer as to how one would tackle a schema migration in Tiger.
Regards,
Fredrik
Schema Migration
I'm afraid schema migration in Tiger is ugly. Conceptually it's not so difficult -- you walk your object tree, duplicating objects in a second managed object context -- but you need quite a bit of code to do it.
The best example I think I have seen is the CoreRecipes sample app from Apple:
http://developer.apple.com/samplecode/CoreRecipes/
Drew
---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org
Conceptual question
Just getting started with Cocoa, but thinking about using CoreData from the start.
All of the examples I've seen so far have a single unified data model. How do you do more than one? For example, suppose we have a molecule database like some of those described above. Now, perhaps we want to create a program that does e.g. thermo calculations of some kind with those molecules; there is likely to be a data model for the thermo program that has other entities and relationships, but one of the kinds of entity will be a molecule, and we want to refer to those in the existing core data molecule "library"
How is that supposed to be done?
2 data models
To unitoops:
You can load 2 models when building the core data stack, so that the same database file stores information about the 2 models. The problem is relationships: you can't get relationships unless both entities are in the same models. The only way around that is to use 'weak' relationship, where you refer to another entity using its identifier. With Tiger, there was a big gotcha. An entity would only have a temporary identifier until you saved, and you could not use that as a persistent link, of course, so you had to save twice and fix the links on the second pass. With Leopard, you can force an object to get its permanent identifier, which you would do in the '-willSave' method of the entity keeping a link to this object. I hope this is clear ??
2 data models
Clear enough to give me somewhere to look, thanks.
The one issue is that I think I'd prefer these models to live in different storage places. In other words, one of them is like a library of chemical species (for which I may indeed have a core data-based editor), but the other one might be e.g. a chemically reacting mixture, and I might store the identity of its species and perhaps compositions, T, P etc there. I want to be able to refer to the chemical species in the reaction model, but I don't want to persist to the same place.
Does that make sense?
2 data models, 2 stores
Core Data also supports having several stores all under the control of one managed object context. Again, you can't have direct relationships between the 2, but they will coexist in the same app, and will otherwise behave as being all the same big pool of data.
Another comment on weak linking. The object identifier provided by Core Data automatically is guaranteed to be unique, which is good. However, there are many occasions when this might be too unique, and it results in weak linking being tied to a particular store on a particular computer. For instance, if you copy one molecule from one store to another, or share it between different users / computers / stores, you may want all the objects to have the same ID. Weak linking would then work even when one of the store is replaced with a different copy (e.g. one of the database could be erased and replaced with a different version created on a different computer). It is easy to get a unique identifier, there is one API in Cocoa for that. Can't find it right now.