Knowledge graph completion consists in, mainly, using some technique to guess what are the missing edges in an incomplete knowledge graph. Evaluating these techniques involves simulating this incomplete-ness by taking a knowledge graph and removing some of the edges. It is also typical to generate negative examples (edges that should not be in the graph) for training or testing the techniques. And of course, the graph can be processed beforehand to filter some relations, etc. When techniques have been applied, the evaluation metrics must be selected and computed.
Overall, KG completion evaluation involves lots of decisions that can crucially affect the perceived performance of the techniques. However, papers presenting these evaluations use heterigeneous terminology, consider different aspects when creating evaluation datasets, use different metrics, and overall make it difficult to study what things should be taken into account when preparing evaluation.
Therefore, to ease the evaluation process, we created the previous version AYNEC (All You Need for Evaluating Completion), and finally, AYNEXT. It is a Python suite aimed towards researchers in the field of link prediction in Knowledge Graphs, making it easy to configure and customize the creation of evaluation datasets and the computation of evaluation metrics and statistical significance tests for each pair of link prediction techniques. AYNEXT is composed by the DataGen and the ResTest tools, which are implemented as Python scripts. To run them, check the parameters at the start of the Python file, and run it from console. The Python files contains documentation about every parameter and function. The next image presents AYNEXT’s workflow:
Finally, AYNEXT includes a set of pre-generated evaluation datasets (AYNEXT-Dataset) that follow popular configurations for out of the box evaluation. These datasets, available at Zenodo, include full and reduced versions of existing popular datasets like Freebase and NELL.

AYNEXT defines a total of 9 variation points:
- VP1: Fraction of the original graph to keep.
- VP2: Relation frequency threshold.
- VP3: Relation accumulated fraction threshold.
- VP4: Inverse removal.
- VP5: Testing fraction.
- VP6: Testing fraction per relation.
- VP7: Inclusion of training negatives.
- VP8: Negatives per positive.
- VP9: Negatives generation strategy.
The following image shows how these were instantiated for AYNEXT-Datasets:

Related articles
2023
AYNEXT-tools for streamlining the evaluation of link prediction techniques Journal Article
In: SoftwareX, vol. 23, pp. 101474, 2023, ISSN: 2352-7110.
2019
AYNEC: all you need for evaluating completion techniques in knowledge graphs Proceedings Article
In: The Semantic Web: 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2–6, 2019, Proceedings 16, pp. 397–411, Springer 2019.
How to use AYNEXT
You can find the full documentation of AYNEXT on its Github repository (link above).