Activity feeds fixed

Part of our server move was prompted by the ever-increasing indexing of our many, many activity feeds.

In addition to the overall history feed on the front page, each vocabulary has a history feed:
Metadata Registry Change History for Element Set: ‘MARC 21 Elements 00X’

Each element or concept also has its own history feed:
Metadata Registry Change History for Element: ‘Biography of Books’ in Element Set: ‘MARC 21 Elements 00X’

Each individual statement has a history feed (yes we are indeed as crazy as you may suspect we are at this point):
Metadata Registry Change History for Property: ‘label’ of Element: ‘Biography of Books’ in Element Set: ‘MARC 21 Elements 00X’

We believe that it’s important for vocabulary management systems to track changes (what, who, when) right down to the statement level and while we have no idea if there’s any utility to that fine-grained a history feed, we do track it so it seemed reasonable to offer a feed, even at that level, in RSS1 (rdf), RSS2, and atom no less.

This means we have thousands of feeds and these are being parsed heavily by various search engines, much more so than the resource changes they represent, and this generates a lot of bandwidth and processing overhead. We’re working on a number of different solutions, but the first one we tried — implementing app-level managed caching — didn’t do the trick. and we spent quite a bit of time fiddling with different configuration options. And in tinkering with that we broke the feeds in a way that was very hard to track down, since they passed our integration tests, worked on our staging server, and then broke on the production server… sometimes.

So they’re fixed now, and we’re exploring other caching options.

Password reset fixed

One of the side effects of our server move was the need to reconfigure our locally-hosted transactional email services, which were always a little flaky — if you have tried to reset your password lately you will no doubt have noticed that the promised email never arrived.

That’s fixed now. We  switched from self-hosted to using Postmark, which works very well and should be far more stable.

Server move in progress

…from Rackspace to Digital Ocean and from one large-ish server to several smaller ones, from a relatively prehistoric version of RedHat to the latest Ubuntu 14LTS, from Zend Server+Apache to nginx (openresty actually), and from PHP5.2 to PHP5.5. It really wasn’t as painful as it sounds like it should have been. As part of the move, we also made some improvements in error and uptime reporting, integrated into a Slack channel for near real-time monitoring.

Actually the move is pretty much complete, although we’re still tinkering a bit with the caching and load balancing setup — we apologize for any brief outages you may experience while we do that — since we couldn’t afford to completely replicate the entire server stack in a staging environment.

Getting browser/proxy/server caching right is actually taking more time than we anticipated. We’ll let you know here as soon as we turn our attention to other things. In the meantime, if you see something, please say something — we’ve got issues!

Server issues

Normally the Registry just hums along quietly and doesn’t demand too much attention. But the last system update we performed seems to have altered our memory usage pretty dramatically and we’re quite suddenly having out-of-memory issues and some heavy swapping. We’ve expanded the server capacity twice already as a stopgap while we investigate, but before we move to an even larger server we’re testing some alternative configurations.

In the meantime there may continue to be periodic slowdowns, but you should see some improvement in a few days.

The last thing we want is for you to think we’re not seeing, or ignoring, the problem.

Still here, folks

When we first designed the NSDL Registry, part of the requirements was that it be able to run in Lynx (seriously) and be usable without JavaScript in all browsers including all versions of IE. As you will probably have noticed, the world of web development has changed a bit in the last 8 years.

Lately it has come to our attention that the Open Metadata Registry looks a lot like abandonware. The reality is that we’re hard at work on a long-planned, often delayed, and absolutely necessary update.

Stay tuned.

Whoops

Until lately we’ve been pretty happy with our ISP, Dreamhost. But a few months ago, after several during-the-presentation meltdowns of the Registry we determined that we needed to move to a higher-performing, more reliable server. We could have done the easy thing and moved to a Virtual Private Server at Dreamhost. Instead, we setup an entirely fresh server in the Rackspace Cloud and very carefully, with much testing created a fresh instance of the Registry with greatly expanded data capacity, some updated code, and considerably more speed. So far, so good.

We had a self-imposed deadline of several weeks before the DCMI 2011 Conference in The Hague and completely missed it. This left us with the choice of waiting until after the conference to redirect our domain to the new server or taking the risky step of switching domains just a few days before the conference. Of course, we didn’t wait. At which point we discovered that we couldn’t simply redirect our main domain to the new server but needed to redirect the subdomains as well, breaking our wiki and blog. Which we had a great deal of difficulty restoring while on the road to DCMI.

But everything’s back to normal now, and even updated. We now resume our regular programming.

SPARQL queries

We’re using Benjamin Nowack‘s excellent ARC libraries for a tiny bit of our RDF management. It may surprise you to know that we don’t use a triple store as our primary data store, but we do too many things with the data that we think are cumbersome at best when managed exclusively in a triple store (a subject for another post someday). Still, last year we started nightly importing of the full Registry into the ARC RDF store and enabled a SPARQL endpoint, thinking that it might be useful.

Lately, we’ve heard a few folks wishing for better searching in the Registry and since we’re actively building an entirely new version of the Registry in Drupal (a subject for another post someday) we’re loathe to spend time doing any serious upgrading of the current Registry. But we have SPARQL!

Yesterday, as part of another conversation, a colleague helped me figure out what I think is a fairly useful search query. If you follow the link, you’ll be taken to the Registry’s SPARQL endpoint which will display inline a list of all of the skos:Concepts in the RDA vocabularies which have no definitions. Well, 250 of them anyway since that’s the arbitrary limit we’ve set on the endpoint. They’re not hyperlinked (which would be really useful) but it’s still good info.

The SPARQL query used to create the list:

PREFIX skos: <http://www.w3.org/2004/02/skos/core#> 
SELECT DISTINCT ?s
WHERE { GRAPH ?g { ?s ?p ?o . }
OPTIONAL { ?s skos:definition ?Thing . }
FILTER (!bound(?Thing))
FILTER regex(str(?s), "^http://RDVocab.info/termList")}

…can be used to find any missing property (see ‘optional’) and the regex used in the filter can be modified to limit the search to any vocabulary, group of vocabularies, or a domain. I’m not enough of a SPARQL expert (meaning I’m completely clueless) to know how to filter by attribute, but it should be possible, if not easy, to find skos:Concepts that have an English definition, but no German definition (I look forward to your comments).

Check Your Bookmarks!

As of today, the link to Step-by-Step instruction on the Registry front page has been updated (and not a moment too soon). The new link still goes to the Registry Wiki, but the link itself is different, so those of you eager users who have bookmarked the old instructions might want to take a look at the new ones and update those bookmarks. The old instructions are still there, and, at the moment, not necessarily wrong, but they show the old logo, are pretty sparse in places, and haven’t been updated in a while so lack a few important functional bits.

The instructions are a work in progress–still being rewritten, expanded, and primped. We plan to continue building them up, and in addition are working on some training screencasts (with the valuable help of Tom Johnson)–and we’ll link those to the instructions as we complete them. We’ll try to continue to pop up and let you know when new parts are completed (you can see some empty stubs where we’ll be working along). As always we’re happy to hear if you have suggestions, complaints, etc., and votes for where we should be working next will be welcome (though not necessarily something we’re guaranteed to pay attention to).

We’re still working on planning for some major changes in the fairly near future, at which time the documentation will again be updated (we hope on a more ambitious schedule). As we all know, the work of a documentarian never ends … (can I have some cheese with that w(h)ine?)

Announcing the New Open Metadata Registry

As of this week, the familiar NSDL Regisry has a new name–the Open Metadata Registry–and a new logo. The name change reflects the fact that we’re no longer receiving funding from the National Science Foundation on behalf of the National Science Digital Library (NSDL), but also recognizes that the Registry has become one of the leaders in providing open, stable tools for those building infrastructure for the Semantic Web.

As part of this change, we’re joining our colleagues at JES & Co. as a project under their umbrella, as well as bringing current users and partners together in an Open Metadata Registry Consortium to build a sustainable plan for moving the Open Metadata Registry forward. Please watch for additional announcements and an expansion of the new look for our pages. If you’d like more detail on the Consortium, please contact Diane Hillmann at metadata dot maven at gmail dot com.

July 20, 2010

Readable URIs

Over the years we’ve been engaged in a number of discussions in which the ‘readability’ of URIs was raised, either as an issue with non-readable URIs or as a requirement in new URI schemes.

At the Registry, we understand and are sensitive to the desire for human readability in URIs. However embedding a language-specific label in the URI identifying concepts in multilingual vocabularies has the side effect of locking the concept into the language of the creator. It also unnecessarily formalizes the particular spelling-variant of the language of the creator, ‘colour’ vs. ‘color’ for instance.

When creating the URIs for the RDA vocabularies we acceded to requests to make the URIs ‘readable’ specifically to make it easier for programmers to create software that could guess the URI from the prefLabel We have come to regret that decision as the vocabularies gained prefLabels in multiple languages. And it creates issues for people extending the vocabulary and adding concepts that have no prefLabel in the chosen language of the vocabulary creator.

That said, the case is much less clear for URIs identifying ‘things’, such as Classes and Properties, in RDFS and OWL, since these are less likely to have a need to be semantically ‘understood’ independent of their label and are less likely to be labeled and defined in multiple languages. In that case the semantics of the Class or Property is often best communicated by a language-specific, readable URI.

In the end I personally lean heavily toward non-readable identifiers because of the flexibility in altering the label in the future, especially in the fairly common case of someone wishing to change the label even though the semantics have not changed. This becomes much more problematic when the label applied to the thing at a particular point in time has been locked into the URI.

I’m not trying to start a non-readable URIs campaign, just pointing out that the Registry, in particular, is designed to support vocabulary development by groups of people, whose collective agreement on labeling things may change over the course of the development cycle, who are creating and maintaining multilingual vocabularies. Our non-literal-label URI default is designed to support the understanding we’ve developed of that environment over time.