May 16th, 2015 Additions: Evangelos Pafilis

In attendance

Group from Crete

Neil met these folks at JGI.


Other Moorea/biocode folks

Berkeley Lab - Lewis Lab

The story so far…

Started with genetic sequencing (biocode), expanding to bio-social understanding throughout time:

Theoretical physicist from Switzerland (Matthias Troyer (ETH) et al. ?), Rich Williams, set up computational ecology group at MS Research, Cambridge.

Pilot project on water models.

20 “nodes” - organizational clusters of people.

On Crete

Realizing the value of good data encoding standards, data exchange and openness.

Setting up platform for data management, incorporating historical collections, building capacity that could be applied to Moorea IDEA. Already working with multiple communities.

Georgos has government research institute intended to be a hub for Greece on bioinformatics, has a slide deck he uses to inspire traditional researchers to join with the larger project. This center is the largest in Greece, even larger than biomedical research centers.

Crete is ~600,000 people in “23 Mooreas” in area. Also much less isolated system than Moorea. E.g., economic crisis impacts the ecology of villages as young people come back from cities. More specifically, there is a shift to organic farming. In the north of Greece, it gets quite cold - folks cut down forests illegally.

Charlotte is interested in the political transformation of Crete. What kinds of political experiments can we observe (or perhaps support).

A hub for single-point-access to the Greek Biodiversity Data, Information and Knowledge is being setup. The relevant project is “”, the Greek node of the EU LifeWatch infrastructure (

What is the data science side?

Open science project. Are there ways Crete would like to contribute to Moorea (or the IDEA infrastructure more generally)? Is there a community of “early adopters?” What is central organization? Is there a technical framework (or a social one)?

A core is available on biocoding, but Neil suggests this can be improved. Foodwebs are similarly developed, still lots of data science challenges.

see PDF: Workshop: Food Webs for Model Islands Berkeley Institute for Data Science, Berkeley, California, 27­28 April 2015

Stay focused on local, but want tools to be generally useful. Synthesis of data from different projects is an ideal. This already happens in food web research (genetic data, occurrence data, interaction occurrence, spatio-temporal, lots of siloed research).

Jorrit suggests that we should start with making things useful, then proceed to standards. Find 2, 3 use cases, list data scripts to be used (e.g. rOpenSci).

Karthik points out that things like this happen at LTER and pisco (sp?), but very siloed. Data ONE has lots of funding, but it’s all behind the scenes. Lean approach could provide tools as they come to allow immediate leverage.

(Cross-referencing, this is somewhat different than what’s currently happening with ODK. It would be good to see what Matt, Waylon and crew think about the dangers here - folks expect support once things are delivered.)

Concerns - BIDS can’t do much beyond space and connections. We need funds to pay a fraction of a person’s time.

Talking about what BIDS can do - we should fly Tony Fountain up for a talk!

Pilot projects - folks have started building a 4D avatar presented on wall in Zurich, ability to layer data (done by folks in Zurich, also working on Singapore).

A central concern from Neil, keep physical model (for example) interacting with social and biological models to avoid re-creation of scientific silos.

What is an avatar? Best representation of the thing (but there are many). Avatar itself is not visible (it’s too complex, humans can’t see all of it).

Jorrit raises idea of two kinds of outputs - rough educational / public-facing materials are easy, but still force debugging and integration of information. Scientific applications can build on this effort to do more rigorous work.

Charlotte’s question about how to use this for climate change education / policy. Dav’s response is that this is too hard, just make something interesting that helps people explore according to their interest. Neil confirms this idea.

Evangelos has example of conveying messages with materials from human anatomy textbooks?

e.g see the association of the kallikrein-related peptidase 3 (prostate specific antigen) with different tissue types of the human body

Specific proposals

Relatedly - what makes its way to textbooks, and what remains only with experts? There are very few complete food webs (most complete appear to be lake systems, or marine).

We’d like an “ecologically aware” Google search. This is an extension of what Tony Dell and colleagues did with their “One Cubic Foot” project. Example of unjustified conflation between two types of hawkfish. Perhaps could have been prevented with trait bank (e.g., mouth morphology) - work by Chris Meyer (has data on what was in guts, biocoded these contents, then screened against reference database).

Getting concrete

Can we get text mining data from Evangelos into GLoBI? Extend EOL? But his system works better with sequences due to use of NCBI taxonomy. If want species name without knowing exact DNA / ID, that could be a draft approach. This would be a data source in the GLoBI ecosystem. Note that all species from Moorea will have a sequence in the gene bank!

Hackathon in ID Bio (in Gainesville, FL) in June or so, looking at doing the above.

Process of annotation of papers might be quite similar to Nick Anderson / / Text Threshr approach to news articles. Note funding from Sloan. Hybrid approaches may help streamline this approach. Talk to Spencer about using this tool?

Chris Mungall knows of literature on text mining protein-protein interactions.

BiodiversityHeritageL library ( may be a good source. BHL in collaboration with Prof. Ananiadou, Dr Navarro (NACTEM, UK) and other colleagues are already working on

GLoBI has a nice API where a group of students recently built a species interaction exploration tool. You could start here. Pictures coming in via iNaturalist (curated by EOL).

More todo

“Guts of the Avatar”

List from a few select data sources (papers, experts) from a few key (island) systems (Moorea/Crete/Hawaii) on terms they use:

Map those terms to ontologies