LIterature coNcept Knowledgebase

Search 30M documents and 600M semantic relations
Preview! Please use a recent browser

Browse semantic relations in PubMed

Explore data extracted from the complete set of PubMed abstracts. With LINK you are able to explore how biomedical entities are linked together.

Explore Genes, Diseases, Drugs and more...

LINK is focused on three main entities: genes, diseases (including phenotypes) and drugs. When we cannot identify one of the three, key concepts are extracted via Natural Language Processing and used to fill the gaps

Generate new hypothesis

Use LINK to find new hypotheses for how a target affects disesae, repurpose existing compounds to a different disease, find new trends and connections made in the literature about your target or disease of interest.

Getting started

Search for entities or concepts

Start typing in the search box

You'll see a list of entities that either match your query or are very highly related to what you are searching for. E.g. A search for "BRAF" will also suggest "metastatic melanoma".

Just select one or more entities from the drop down and press GO.

Why select entities?

Selecting an entity from the suggested list will optimize the search to recognise all the synonyms we are aware of.

You can still search without selecting any entity if you want. Just type something and press GO.
E.g. you could search for the PubMed ID of an article you are interested in and use LINK to show you a map of related entites and concepts.

Start Now

The technical bit

Where is the data coming from?

LINK is a service built on top of the data generated by the Open Targets Library Project

Using state of the art tools, we built a serverless pipeline (based on Apache Beam) to process all the publications released by PubMed in a couple of hours of computation in Google cloud.

We precompiled a set of dictionaries to recognise entities from:

We analyse each title and abstract with spacy and extract key concepts and semantic relations in the form of subject-predicate-object triples. This means that every time you see a link between two entities in our graphs, there is at least a sentence in one abstract linking them in a subject-predicate-object structure. For the sake of simplicity we currently consider the graph undirected.

You can click on any link in the graph, see the annotated sentences, and go to the original paper

What technology are you using?

We built a huge graph (more than 500 millions edges) but we are not using a graph database to store it. Everything is saved in the form of JSON files in a standard size Elasticsearch cluster. It is still possible to get the data and put it into a graph database (more coming).

We make use of the advanced querying capabilities of Elasticsearch to extract the set of nodes relevant to your query. Elasticsearch has a very nice Graph API that is able to return all the edges to build graph that we show. While the Elasticsearch Graph API works great and is lightning fast, it was not fully customisable to our needs, so we implemented a similar strategy in our REST API.

The LINK REST API will start from your query, build a graph of entities related to the query and then expand it for entities related to that seed graph, allowing you to have a bigger picture of the area you are interested in. It will also filter out hot nodes that are not useful in our view. E.g. "cancer" would dominate any graph it goes into but, being a very general term, would add very low value to the quality of the information provided

What about the UI? The nice graphs we show are built with linkurious.js, while the rest is a simple HTML5 page powered by Semantic UI


Thank you Open Targets


This website is released as a proof of concept. There is no guarantee it will be regularly updated in the future.

Follow us
| | |
Contact us
Copyright 2018 Open Targets. All rights reserved.
Terms of Use | Privacy | Cookies