Page tree
Skip to end of metadata
Go to start of metadata

CLDI IT WG Conference Call

February 14, 2018

3pm EDT / noon PDT


  1. Introductions & Welcome (temporarily at least) to Dan Scott!
  2. Plans for continuing to integrate with other working groups
  3. Status of technology sandbox at WestGrid
  4. Ideas for data sets we want to play with, research projects we want to undertake
    1. Bilal - best practices for integrating distinct data sets
    2. Request from Digital Projects for URI minting - how would this work? Anyone want to take this on?
    3. Other ideas?


Present: Jenn, Bilal, Peter, Paul, Russell (Ottawa - Canadiana), Gagandeep (McGill), Mutugi (McGill), Dan Scott

- Dan Scott is on the call - joining for a couple months - would like to help bring momentum - worked on bib extensions - did proof-of-concept client-side sparql fetch of Wikidata info on bands/musicians, display card

2. Integration with other groups

3. Sandbox - not ready yet - will document the process - N.B. on basic Canada Computes infrastructure there will be a 100,000 statement limit

4. Datasets

  • thesis dataset - want to show ability to add value via linking
  • Jenn: metadata unit has idea for a project (student newspaper full text, named entity extraction), has asked for help
  • Dan: small project, if can get student help: grey literature collection on mining reclamation - will be doing metadata from scratch - c.140 reports
  • Bilal: serendipitous browse environment built around Banting and Best, migrated from MySQL db
  • Paul:
    • coop student did metadata extraction proof-of-concept with focus on geo (working in OpenRefine)
    • head tax db - c.100k entries of Chinese immigrants - lots of data re place of birth etc. - would like to develop ontology for immigration - also interested in platform for developing and maintaining an ontology - would be cool to link the head tax data to the Real Face of White Australia transcription project - Bilal mentioned VitroLib for supporting ontologies - Dan suggested WikiBase instance
  • classic conversion of MARC dataset to linked data - pick up on Ian Bigelow's BIBFRAME work
  • problem of displaying this kind of data on the web (will be a component of Bilal's project)

  • URI minting - requested by Digital Projects group - need to work out what their needs are: resolvable? content negotiation? etc.
  • No labels


  1. I'd like someone to explain what they mean by "URI minting" exactly.

    1. Good question! It was actually your co-chair, Sharon, who asked for this at the November 2017 meeting in Montreal, and we agreed we'd have to explore that very question once we're able to work on this. We'll definitely be getting back to your group to discuss your needs. We do note that whatever we do at this point would be experimental and a learning process, and not a production-ready or permanent service. 

  2. On the sandbox:  Is the limit related to storage directly attached to the virtual machine? I'm wondering if Compute Canada also offers SWIFT storage (OpenStacks S3 equivalent), and if any of the triplestore databases we want to explore also support SWIFT/S3?


    Are there speaking notes/slides/video for the talk that Bilal gave at OLA .

    I see that Ian Bigelow did one as well:

    Chinese Immigration Records:

    I am aware of a few researchers that have had an interest in transforming images such as into data.  Some are working on grant proposals.  This is one of those areas where OCR doesn't help, as the really interesting stuff isn't the easily OCR'd stuff (the form), but the handwritten additions and the photographs.

  3. Re storage for the sandbox: I don't think they offer Swift, but they do have storage available. The basic allocation for anyone with an account is a 50GB ownCloud allotment. But I don't think this would help for the kind of fast granular access that a triplestore would need. There is a process for requesting more resources, and I think we can look into that when we have a clearer sense of what we need.

    Those immigration records look very similar to what Tim Sherratt is working with (via crowdsourced transcription) in the White Australia project (l pasted a link in the notes). It would be very interesting to see what kind of cross-linking is possible between the two sets.

  4. Hi everyone:

    Here are notes from our talk at OLA:

    (Please let me know if the link above works properly, and that you can view the embedded video on slide 27. Trying this onedrive thing for the first time, so if it fails, I'll move the presentation over to Google).

  5. I can confirm the link works. Thank you.

  6. Thanks Bilal, the link does work!