Explore data extracted from the complete set of PubMed abstracts. With LINK you are able to explore how biomedical entities are linked together.
LINK is focused on three main entities: genes, diseases (including phenotypes) and drugs. When we cannot identify one of the three, key concepts are extracted via Natural Language Processing and used to fill the gaps
Use LINK to find new hypotheses for how a target affects disesae, repurpose existing compounds to a different disease, find new trends and connections made in the literature about your target or disease of interest.
Start typing in the search box
You'll see a list of entities that either match your query or are very highly related to what you are searching for. E.g. A search for "BRAF" will also suggest "metastatic melanoma".
Just select one or more entities from the drop down and press GO.
Selecting an entity from the suggested list will optimize the search to recognise all the synonyms we are aware of.
You can still search without selecting any entity if you want. Just type something and
E.g. you could search for the PubMed ID of an article you are interested in and use LINK to show you a map of related entites and concepts.
LINK is a service built on top of the data generated by the Open Targets Library Project
Using state of the art tools, we built a serverless pipeline (based on Apache Beam) to process all the publications released by PubMed in a couple of hours of computation in Google cloud.
We precompiled a set of dictionaries to recognise entities from:
We analyse each title and abstract with spacy and extract key concepts and semantic relations in the form of subject-predicate-object triples. This means that every time you see a link between two entities in our graphs, there is at least a sentence in one abstract linking them in a subject-predicate-object structure. For the sake of simplicity we currently consider the graph undirected.
You can click on any link in the graph, see the annotated sentences, and go to the original paper
We built a huge graph (more than 500 millions edges) but we are not using a graph database to store it. Everything is saved in the form of JSON files in a standard size Elasticsearch cluster. It is still possible to get the data and put it into a graph database (more coming).
We make use of the advanced querying capabilities of Elasticsearch to extract the set of nodes relevant to your query. Elasticsearch has a very nice Graph API that is able to return all the edges to build graph that we show. While the Elasticsearch Graph API works great and is lightning fast, it was not fully customisable to our needs, so we implemented a similar strategy in our REST API.
The LINK REST API will start from your query, build a graph of entities related to the query and then expand it for entities related to that seed graph, allowing you to have a bigger picture of the area you are interested in. It will also filter out hot nodes that are not useful in our view. E.g. "cancer" would dominate any graph it goes into but, being a very general term, would add very low value to the quality of the information provided