Nantes Université

Skip to content
Extraits de code Groupes Projets

Comparer les révisions

Les modifications sont affichées comme si la révision source était fusionnée avec la révision cible. En savoir plus sur la comparaison des révisions.

Source

Sélectionner le projet cible
No results found
Sélectionner une révision Git

Cible

Sélectionner le projet cible
  • E20D463S/semantics2024
1 résultat
Sélectionner une révision Git
Afficher les modifications
Validations sur la source (2)
\section{Conclusion and Perspectives} \section{Conclusion and Perspectives}
\label{sec:conclusion} \label{sec:conclusion}
<<<<<<< HEAD
In this paper, we introduced SHARP, a hybrid SPARQL query relaxation model that integrates both ontology-based and instance-based strategies, enhanced by an original set-based similarity ranking. SHARP effectively relaxes diverse query types involving classes, properties, entities, and literals, and supports RDF-star reification.
Additionally, we introduced a human-judgment-based benchmark consisting of seven queries to evaluate the quality of relaxed query results.
Our experiments highlight the performance of SHARP across multiple quality evaluation criteria.
Compared to two relevant state-of-the-art approaches, our model consistently delivers more relevant top-k results and achieves higher agreement with user-validated rankings, particularly in scenarios involving relaxations over queries containing annotations.
This work opens several perspectives for future research. As our approach supports RDF-star syntax for querying metadata, we plan to extend it to transparently handle reification models. Another important direction is support for property path queries, which introduces challenges related to sequences and hierarchical relationships.
=======
In this paper, we introduced SHARP, a hybrid SPARQL query relaxation model that integrates both ontology-based and instance-based strategies, enhanced by an original set-based similarity ranking. SHARP effectively relaxes diverse query types involving classes, properties, entities, and literals. In this paper, we introduced SHARP, a hybrid SPARQL query relaxation model that integrates both ontology-based and instance-based strategies, enhanced by an original set-based similarity ranking. SHARP effectively relaxes diverse query types involving classes, properties, entities, and literals.
Additionally, we introduced a human-judgment-based benchmark consisting of seven queries. The goal is to help in evaluating the quality of relaxed query results. Additionally, we introduced a human-judgment-based benchmark consisting of seven queries. The goal is to help in evaluating the quality of relaxed query results.
Our experiments highlight the performance of SHARP across multiple quality evaluation criteria. Our experiments highlight the performance of SHARP across multiple quality evaluation criteria.
Compared to two relevant state-of-the-art approaches, our model consistently delivers more relevant top-k results and achieves higher agreement with user-validated rankings, particularly in scenarios involving relaxations over queries containing annotations. Compared to two relevant state-of-the-art approaches, our model consistently delivers more relevant top-k results and achieves higher agreement with user-validated rankings, particularly in scenarios involving relaxations over queries containing annotations.
This work opens several perspectives for future research. As our approach supports SPARQL queries with reification, specifically RDF-star, we plan to extend it to handle transparently whatever reification model. Another important direction is the support for property path queries, which introduces challenges related to sequences and hierarchical relationships. This work opens several perspectives for future research. As our approach supports SPARQL queries with reification, specifically RDF-star, we plan to extend it to handle transparently whatever reification model. Another important direction is the support for property path queries, which introduces challenges related to sequences and hierarchical relationships.
>>>>>>> aa3e271fa4704fa142e648dcf2e50b1f1cf68106
%Finally, we aim to integrate our relaxation framework into a engine to facilitate exploratory querying over RDF knowledge graphs. %Finally, we aim to integrate our relaxation framework into a engine to facilitate exploratory querying over RDF knowledge graphs.
\section{SHARP: SPARQL Hybrid Query Relaxation Approach} \section{SHARP: SPARQL Hybrid Query Relaxation Approach}
\label{sec:contribution} \label{sec:contribution}
This section presents our hybrid relaxation model, which integrates ontology-based and entity-based methods to relax query termsclasses, properties, entities, and literals. This section presents our hybrid relaxation model, which integrates ontology-based and entity-based methods to relax query terms, i.e., classes, properties, entities, and literals.
% %
In Section \ref{sec:terms2sets} we abstract all terms in a triple pattern to set level. We begin by abstracting classes and properties to sets to create a unified representation. In Section \ref{sec:terms2sets} we abstract all terms in a triple pattern to set level. We begin by abstracting classes and properties to sets to create a unified representation.
And we extend this abstraction to entities and literals by treating them as elements within these sets. And we extend this abstraction to entities and literals by treating them as elements within these sets.
...@@ -76,7 +76,7 @@ Our strategy is based on the following hypothesis: ...@@ -76,7 +76,7 @@ Our strategy is based on the following hypothesis:
\end{equation} \end{equation}
\item The order of relaxation between elements follows the output of the mapping function. Ideally, similarity should decrease linearly with the mapping function. However, enforcing linearity directly fails to satisfy the below conditions. Instead, we assume an affine dependency (i.e., a linear relationship with a constant offset) to preserve the ranking order given by the mapping function \( MF \): \begin{equation}\small \exists a,b: \forall j: \text{Sim}(e_1, e_j) = a \cdot MF(e_1, e_j) + b \label{eqn:affine}\end{equation}. \item The order of relaxation between elements follows the output of the mapping function. Ideally, similarity should decrease linearly with the mapping function. However, enforcing linearity directly fails to satisfy the below conditions. Instead, we assume an affine dependency (i.e., a linear relationship with a constant offset) to preserve the ranking order given by the mapping function \( MF \): \begin{equation}\small \exists a,b: \forall j: \text{Sim}(e_1, e_j) = a \cdot MF(e_1, e_j) + b \label{eqn:affine}\end{equation}.
\item Suppose that the initial query gives some answers but not enough. In such case, relaxing $e_1$ wit $e_2$ can keep the answers provided by the initial query and the result of the relaxation, i.e., \item Suppose that the initial query gives some answers but not enough. In such case, relaxing $e_1$ with $e_2$ can keep the answers provided by the initial query and the result of the relaxation, i.e.,
$Sim(e_1,e_2)= Sim(e_1,\{e_1,e_2\})$. In general, $Sim(e_1,e_2)= Sim(e_1,\{e_1,e_2\})$. In general,
\begin{equation}\small \begin{equation}\small
Sim(e_1,e_j)= Sim(e_1,\{e_1,e_2, ... e_i\}) Sim(e_1,e_j)= Sim(e_1,\{e_1,e_2, ... e_i\})
...@@ -96,8 +96,7 @@ By applying equations \ref{eqn:limit}, \ref{eqn:affine}, and \ref{eqn:union}, we ...@@ -96,8 +96,7 @@ By applying equations \ref{eqn:limit}, \ref{eqn:affine}, and \ref{eqn:union}, we
\label{eqn:sim_entities} \label{eqn:sim_entities}
\end{equation} \end{equation}
By replacing the elements \( e_1 \) and \( e_j \) in Equation~\ref{eqn:sim_entities} with \emph{entities} or \emph{literals}, we obtain the similarity functions for entities and literals respectively, as shown in Equations (15) and (16) of Table~\ref{tab:similarity_equations}. By replacing \( e_1 \) and \( e_j \) with \emph{entities} or \emph{literals}, we obtain the similarity functions in Equations~(15) and~(16). For classes and properties, we use standard similarity functions as shown in Equations~(13) and~(14).
\medskip \medskip
%\textcolor{red}{Add here two solutions for entities and literals.} %\textcolor{red}{Add here two solutions for entities and literals.}
......
...@@ -19,16 +19,16 @@ Second, in Section \ref{sec:eval2}, we compare the \textit{top-k} answers produc ...@@ -19,16 +19,16 @@ Second, in Section \ref{sec:eval2}, we compare the \textit{top-k} answers produc
To be able to define similarities based on information content, we calculated the statistics of our dataset. To be able to define similarities based on information content, we calculated the statistics of our dataset.
In addition, we calculated mapping functions to map entities and literals to similar ones with similarity scores. In addition, we calculated mapping functions to map entities and literals to similar ones with similarity scores.
For entities, we computed embeddings using RDF-star2vec\footnote{\url{https://github.com/aistairc/RDF-star2Vec}}. For entities, we computed embeddings using RDF-star2vec\footnote{\url{https://github.com/aistairc/RDF-star2Vec}}.
For literal, we use several distance functions. For literal, we used several distance functions.
%The outputs of these mapping functions were used as inputs for our model. %The outputs of these mapping functions were used as inputs for our model.
%apply our proposed hybrid model over the benchmark queries %apply our proposed hybrid model over the benchmark queries
We have first focused on comparing the order of relaxed queries generated by each model with the ranking established from users' feedback. We first focused on comparing the order of relaxed queries generated by each model with the ranking established from users' feedback.
We used the recall score to measure to which extent the golden standard queries were generated by each model. We used the recall score to measure to which extent the golden standard queries were generated by each model.
The precision is not useful here because of the important difference in the number of relaxed queries produced by each evaluated model. The precision is not useful here because of the important difference in the number of relaxed queries produced by each evaluated model.
We used Rank-Biased Overlap (RBO) to assess how closely the models' rankings are to the gold standard. This measure only takes into account the gold standard queries produced by the models. We used Rank-Biased Overlap (RBO) to assess how closely the models' rankings are to the gold standard. This measure only considers the gold standard queries produced by the models.
Table~\ref{tab:rbo_jaccard} show that SHARP outperforms in both, recall and RBO scores, indicating that the relaxed queries it generates, and its ranking, closely aligns with the user-validated benchmark. Table~\ref{tab:rbo_jaccard} shows that SHARP outperforms in both, recall and RBO scores, indicating that the relaxed queries it generates, and its ranking, closely aligns with the user-validated benchmark.
Sometimes CONNOR and OMBS are better because SHARP does not produce the same gold standard queries. Sometimes CONNOR and OMBS are better because SHARP does not produce the same gold standard queries.
%For instance, it achieves an RBO of 1.0 for Q7 and scores above 0.7 in several other queries. %For instance, it achieves an RBO of 1.0 for Q7 and scores above 0.7 in several other queries.
On the other side, CONNOR and OMBS show limited ranking similarity, with several queries returning RBO scores close to zero. This is mainly due to the fact that they do not relax annotated triples. On the other side, CONNOR and OMBS show limited ranking similarity, with several queries returning RBO scores close to zero. This is mainly due to the fact that they do not relax annotated triples.
...@@ -65,8 +65,8 @@ On the other side, CONNOR and OMBS show limited ranking similarity, with several ...@@ -65,8 +65,8 @@ On the other side, CONNOR and OMBS show limited ranking similarity, with several
In the second evaluation, we compared the top-$k$ answers produced by each relaxation model with the benchmark answers ranked by human judgment. In the second evaluation, we compared the top-$k$ answers produced by each relaxation model with the benchmark answers ranked by human judgment.
To do this, we varied $k$ from 1 to 60 expected query results. To do this, we varied $k$ from 1 to 60 expected query results.
If a query returns more results than expected, we sampled to get $k-results$. If a query returns more results than expected, we sampled to get $k$-results.
The resulting set was then compared to the $top-k$ results obtained by executing the human-ranked queries in the same order. The resulting set was then compared to the top-$k$ results obtained by executing the human-ranked queries in the same order.
% Table~\ref{tab:precision} shows the precision scores (P@k) for the seven queries. % Table~\ref{tab:precision} shows the precision scores (P@k) for the seven queries.
...@@ -74,8 +74,8 @@ The resulting set was then compared to the $top-k$ results obtained by executing ...@@ -74,8 +74,8 @@ The resulting set was then compared to the $top-k$ results obtained by executing
% CONNOR displays the weakest performance, with generally low precision and missing results on several queries. % CONNOR displays the weakest performance, with generally low precision and missing results on several queries.
Figure \ref{fig:plots} show the variation of the precision scores as $k$ varies. Figure \ref{fig:plots} show the variation of the precision scores as $k$ varies in each query.
As shown, \textsc{CONNOR} is consistently the first model whose precision begins to decline, highlighting its limited ability to maintain relevant results—especially as more results are retrieved. As shown, \textsc{CONNOR} is consistently the first model whose precision begins to decline, highlighting its limited ability to maintain relevant results as more results are retrieved.
\textsc{OMBS} shows behavior similar to our model for Q2 and Q4, where ontology effectively guide the relaxation process. \textsc{OMBS} shows behavior similar to our model for Q2 and Q4, where ontology effectively guide the relaxation process.
However, its performance drops significantly for Q5--Q7, where metadata plays a central role, resulting in near-zero precision. However, its performance drops significantly for Q5--Q7, where metadata plays a central role, resulting in near-zero precision.
......
...@@ -172,18 +172,26 @@ Leveraging both types of relaxation introduces three main challenges: ...@@ -172,18 +172,26 @@ Leveraging both types of relaxation introduces three main challenges:
% in applying both ontology-based and entity-based relaxations at once: % in applying both ontology-based and entity-based relaxations at once:
\begin{itemize}[noitemsep,topsep=0pt] \begin{itemize}[noitemsep,topsep=0pt]
\item \textbf{Managing the exponential growth of relaxed queries.} Combining ontology- and instance-based relaxations causes a combinatorial explosion in the number of the relaxed queries, especially with deep ontologies or many similar entities. This exponential growth makes query generation NP-complete. \item \textit{Designing a unified hybrid similarity ranking.} A unified ranking function is needed to consistently order relaxed queries from both ontology- and instance-based strategies. This requires extending the Information Content (IC) framework to entities and literals, and aligning query elements in a common semantic space while preserving their original ranking order from precomputed mapping functions.
\item \textbf{Defining a ranking function.} A unified ranking function is needed to interlink and score relaxed queries from both ontology- and instance-based strategies. It should consistently reflect their relevance to the original query.
\item \textbf{Extending IC-based similarity to similarity of entities.} Traditional IC measures are tailored to classes or properties. Extending IC to entities and literals requires bridging their similarity to the class level, while preserving the ranking order among entities and literals. \item \textit{Establishing a reliable evaluation benchmark.} Evaluating the effectiveness of relaxed queries demands a benchmark that reflects human judgment. Designing such a benchmark involves selecting representative queries, generating meaningful relaxations, and collecting user-validated relevance assessments—all while ensuring scalability, fairness, and reproducibility.
\end{itemize} \end{itemize}
The research problem addressed in this paper is as follows: \textit{given a knowledge graph and a failing query, how can we define a hybrid ranking system that combines ontology-based and instance-based relaxations, while extending IC-based similarity from the entity level to the class level?} \begin{comment}
\begin{itemize}[noitemsep,topsep=0pt]
\item \textit{Managing the exponential growth of relaxed queries.} Combining ontology- and instance-based relaxations causes a combinatorial explosion in the number of the relaxed queries, especially with deep ontologies or many similar entities. This exponential growth makes query generation NP-complete.
\item \textit{Defining a ranking function.} A unified ranking function is needed to interlink and score relaxed queries from both ontology- and instance-based strategies. It should consistently reflect their relevance to the original query.
\item \textit{Extending IC-based similarity to similarity of entities.} Traditional IC measures are tailored to classes or properties. Extending IC to entities and literals requires bridging their similarity to the class level, while preserving the ranking order among entities and literals.
\end{itemize}
\end{comment}
The research problem addressed in this paper is as follows: \textit{given a knowledge graph and a failing query, how can we define an efficient hybrid system that combines ontology-based and instance-based relaxations, while extending IC-based similarity from the entity level to the class level?}
%%%%%%% RESEARCH CONTRIBUTION AND OBJECTIVES %%%%%%% %%%%%%% RESEARCH CONTRIBUTION AND OBJECTIVES %%%%%%%
Our contributions are twofold. We propose SHARP, a SPARQL Hybrid Query Relaxation Approach, with a new ranking system that combines both ontology-based and instance-based relaxations. Our contributions are twofold. We propose SHARP, a SPARQL Hybrid Query Relaxation Approach, with a new ranking system that combines both ontology-based and instance-based relaxations.
%By integrating both, we aim to improve query flexibility, allowing for more similar results in different case scenarios. %By integrating both, we aim to improve query flexibility, allowing for more similar results in different case scenarios.
The ranking system ensures a total order over relaxed queries. The ranking system ensures a total order over relaxed queries.
Then, we introduce a benchmark that measures how well the results match what the user expects, rather than focusing on execution time or the number of queries. Then, we introduce a benchmark that measures its efficiency, i.e., how well the results match what the user expects, rather than focusing on execution time or the number of queries.
%Our evaluation shifts from comparing relaxed queries to comparing their results, emphasizing effectiveness in retrieving the top-\textit{k}-similar answers. %Our evaluation shifts from comparing relaxed queries to comparing their results, emphasizing effectiveness in retrieving the top-\textit{k}-similar answers.
\begin{comment} \begin{comment}
...@@ -216,4 +224,4 @@ Section \ref{sec:benchmark} describes the proposed benchmark, which serves as a ...@@ -216,4 +224,4 @@ Section \ref{sec:benchmark} describes the proposed benchmark, which serves as a
%Section \ref{sec:reif} introduces RDF reification as a case study, showcasing how the hybrid query relaxation model can be applied to real-world RDF metadata representation techniques. This section discusses the specific challenges of querying reified triples and evaluates how the proposed model handles them in comparison to existing methods. %Section \ref{sec:reif} introduces RDF reification as a case study, showcasing how the hybrid query relaxation model can be applied to real-world RDF metadata representation techniques. This section discusses the specific challenges of querying reified triples and evaluates how the proposed model handles them in comparison to existing methods.
Section \ref{sec:experiments} presents the experimental evaluation. Section \ref{sec:experiments} presents the experimental evaluation.
%Section \ref{sec:future-works} discusses potential avenues for future research and extensions to our work. %Section \ref{sec:future-works} discusses potential avenues for future research and extensions to our work.
Section \ref{sec:conclusion} presents our concluding remarks and outlines the future perspectives. Section \ref{sec:conclusion} presents our concluding remarks and perspectives.
\ No newline at end of file \ No newline at end of file
...@@ -8,7 +8,7 @@ These preliminaries are important to show how relaxed queries are generated by o ...@@ -8,7 +8,7 @@ These preliminaries are important to show how relaxed queries are generated by o
%—where each term can be a concrete identifier or a variable. %—where each term can be a concrete identifier or a variable.
\subsection{Terminology in SPARQL query relaxation} The simplest graph pattern is a \textit{triple pattern}, composed of a subject, predicate, and objectdenoted as $(s, p, o) \in (I \cup V \cup B) \times (I \cup V) \times (I \cup V \cup L \cup B)$where $I$ is the set of IRIs, $V$ is the set of variables, $L$ is the set of literals, and $B$ is the set of blank nodes. \subsection{Terminology in SPARQL query relaxation} The simplest graph pattern is a \textit{triple pattern}, composed of a subject, predicate, and object, denoted as $(s, p, o) \in (I \cup V \cup B) \times (I \cup V) \times (I \cup V \cup L \cup B)$, where $I$ is the set of IRIs, $V$ is the set of variables, $L$ is the set of literals, and $B$ is the set of blank nodes.
%Each of the terms—subject, predicate, or object—can therefore be either a variable or a fixed value. %Each of the terms—subject, predicate, or object—can therefore be either a variable or a fixed value.
A \textit{SPARQL query} $Q$ is composed of a set of triple patterns, typically written as: A \textit{SPARQL query} $Q$ is composed of a set of triple patterns, typically written as:
$Q \{tp_1, tp_2, \dots, tp_n\}$. $Q \{tp_1, tp_2, \dots, tp_n\}$.
......
...@@ -13,12 +13,12 @@ Subsequent works \cite{hurtado2008query, poulovassilis2010combining} extended it ...@@ -13,12 +13,12 @@ Subsequent works \cite{hurtado2008query, poulovassilis2010combining} extended it
To better match user expectations, query ranking techniques are widely used. In the relaxation lattice, queries are partially ordered by relaxation distance (the number of relaxation steps), but queries at the same level remain incomparable. Several works \cite{huang2010query,huang2012approximating,fokou2016rdf,moreau2021ensuring,bennani2021query} address this by applying information content-based similarity to achieve a total order, complementing earlier distance-based methods \cite{hurtado2006relaxed,hurtado2008query,poulovassilis2010combining}. To better match user expectations, query ranking techniques are widely used. In the relaxation lattice, queries are partially ordered by relaxation distance (the number of relaxation steps), but queries at the same level remain incomparable. Several works \cite{huang2010query,huang2012approximating,fokou2016rdf,moreau2021ensuring,bennani2021query} address this by applying information content-based similarity to achieve a total order, complementing earlier distance-based methods \cite{hurtado2006relaxed,hurtado2008query,poulovassilis2010combining}.
All these works also give a relaxed query obtained by replacing an entity or literal by a constant with a zero similarity score. All these works also give a relaxed query obtained by replacing an entity or literal by a constant with a zero similarity score.
To improve scalability, several pruning strategies have been proposed, such as MBS, F-MBS, OBFSR, and BR. O-MBS \cite{fokou2016rdf} stands out by heuristically detecting failure causes without exhaustively computing all minimal failing subqueries. It significantly reduces the number of relaxed queries and execution time, as shown in experiments on LUBM datasets. To improve scalability, several pruning strategies have been proposed, such as MBS, O-MBS, and F-MBS in \cite{fokou2016rdf}, and OBFSR and BR in \cite{huang2012approximating}. Among these, O-MBS \cite{fokou2016rdf} stands out by heuristically detecting failure causes without exhaustively computing all minimal failing subqueries. It significantly reduces the number of relaxed queries and execution time, as shown in experiments on LUBM datasets.
Therefore, we adopt O-MBS as the relaxation strategy in our hybrid model due to its efficiency. Therefore, we adopt O-MBS as the relaxation strategy in our hybrid model due to its efficiency.
\begin{comment} \begin{comment}
To address scalability, \cite{huang2012approximating, fokou2016rdf} proposed pruning strategies to reduce the search space and improve efficiency. To address scalability, \cite{huang2012approximating, fokou2016rdf} proposed pruning strategies to reduce the search space and improve efficiency.
Several pruning techniques were proposed, including MBS, O-MBS, F-MBS, OBFSR, and BR, which aim to reduce the number of relaxed queries either by identifying those unlikely to produce results, optimizing source selection, or using batch execution strategies. Several pruning techniques were proposed, including MBS, O-MBS, F-MBS, OBFSR and BR , which aim to reduce the number of relaxed queries either by identifying those unlikely to produce results, optimizing source selection, or using batch execution strategies.
O-MBS stands out by avoiding exhaustive computation of all minimal failing subqueries. O-MBS stands out by avoiding exhaustive computation of all minimal failing subqueries.
Instead, it detects partial failure causes to heuristically prune the lattice, leading to faster execution with fewer queries. Instead, it detects partial failure causes to heuristically prune the lattice, leading to faster execution with fewer queries.
This was confirmed experimentally in \cite{fokou2016rdf}, where O-MBS was compared against MBS, F-MBS, and a BFS-based baseline. In a series of experiments conducted on LUBM datasets, O-MBS consistently outperformed other strategies in terms of execution time and number of executed queries. This was confirmed experimentally in \cite{fokou2016rdf}, where O-MBS was compared against MBS, F-MBS, and a BFS-based baseline. In a series of experiments conducted on LUBM datasets, O-MBS consistently outperformed other strategies in terms of execution time and number of executed queries.
...@@ -40,18 +40,18 @@ Hogan et al. \cite{hogan2012towards} proposes an approach that ranks entities an ...@@ -40,18 +40,18 @@ Hogan et al. \cite{hogan2012towards} proposes an approach that ranks entities an
Similarly, Elbassuoni et al. \cite{elbassuoni2011query} evaluates the closeness of relaxed queries and answers through human judgment, offering a language-model-based framework focused on entity-centric relaxation. Similarly, Elbassuoni et al. \cite{elbassuoni2011query} evaluates the closeness of relaxed queries and answers through human judgment, offering a language-model-based framework focused on entity-centric relaxation.
However, it does not compare its approach with other techniques, nor does it handle class or property relaxation. However, it does not compare its approach with other techniques, nor does it handle class or property relaxation.
Among entity-based techniques, the partitioning-based approach by Ferré stands out because it is particularly well-suited for queries with many joins, which are common in RDF graphs. Among entity-based techniques, the partitioning-based approach by Ferré \cite{ferre2018answers} stands out because it is particularly well-suited for queries with many joins, which are common in RDF graphs.
It uses Formal Concept Analysis (FCA) to group entities into a concept lattice based on shared properties or attributes. It uses Formal Concept Analysis (FCA) to group entities into a concept lattice based on shared properties or attributes.
Query relaxation is then achieved by navigating this lattice, allowing one to generalize or specialize entities in a structured and explainable way. Query relaxation is then achieved by navigating this lattice, allowing one to generalize or specialize entities in a structured and explainable way.
This process effectively captures semantic proximity without requiring an explicit, precomputed similarity function—unlike most instance-based approaches that rely on mapping functions to determine the ranking of alternatives. This process effectively captures semantic proximity without requiring an explicit, precomputed similarity function—unlike most instance-based approaches that rely on mapping functions to determine the ranking of alternatives.
%It handles class hierarchies through Formal Concept Analysis, making it more flexible and applicable to a broader range of datasets. %It handles class hierarchies through Formal Concept Analysis, making it more flexible and applicable to a broader range of datasets.
Unlike the others, Ferre's approach was not designed for a specific project or dataset, which adds to its generalizability. Unlike others, Ferre's approach was not developed for a specific project or dataset, unlike the EADS industrial use case in \cite{hogan2012towards}, which enhances its generalizability.
Furthermore, it is the only entity-based technique with publicly available source code, which allowed us to use the Java version (called CONNOR) for our experimental evaluation, providing a solid basis for comparison.% in the context of RDF query relaxation. Furthermore, it is the only entity-based technique with publicly available source code, which allowed us to use the Java version (called CONNOR) for our experimental evaluation, providing a solid basis for comparison.% in the context of RDF query relaxation.
\subsection{Analysis of the existing benchmarks} \subsection{Analysis of the existing benchmarks}
Although various datasets have been used in query relaxation research (see \cite{fakih2024survey}), they often differ in focus. Although various datasets have been used in query relaxation research (see \cite{fakih2024survey}), they often differ in focus.
Datasets like MONDIAL, LUBM, and its reification-based extension LUBM4OBDA support ontology-driven relaxations involving class hierarchies, property generalization, and metadata representation. Datasets like MONDIAL, LUBM, and its reification-based extension LUBM4OBDA, support ontology-driven relaxations involving class hierarchies, property generalization, and metadata representation.
Others, such as DBpedia and LibraryThing, emphasize queries over entities and literals, with limited rely on ontology. Others, such as DBpedia and LibraryThing, emphasize queries over entities and literals, with limited rely on ontology.
Some benchmarks are highly domain-specific, like the Vehicles dataset from the EADS project \cite{hogan2012towards}, targeting security-focused applications. Some benchmarks are highly domain-specific, like the Vehicles dataset from the EADS project \cite{hogan2012towards}, targeting security-focused applications.
The dataset sizes in existing benchmarks range from 5k to 13M triples (e.g., the LUBM dataset used in the partitioning-based approach). The dataset sizes in existing benchmarks range from 5k to 13M triples (e.g., the LUBM dataset used in the partitioning-based approach).
...@@ -61,7 +61,9 @@ However, when it comes to queries, existing benchmarks fall short in capturing t ...@@ -61,7 +61,9 @@ However, when it comes to queries, existing benchmarks fall short in capturing t
Most benchmarks provide queries that focus either on class/property hierarchies or on entity/literal values, but not both together. Most benchmarks provide queries that focus either on class/property hierarchies or on entity/literal values, but not both together.
In particular, there is a lack of query sets that simultaneously involve classes, properties, entities, and literals, especially in the context of metadata. In particular, there is a lack of query sets that simultaneously involve classes, properties, entities, and literals, especially in the context of metadata.
To address this, we build upon LUBM4OBDA due to its rich ontology and reification support by adding more diverse literals and entities within the queries. To address this, we build upon LUBM4OBDA due to its rich ontology and reification support by adding more diverse literals and entities within the queries.
We define 7 queries, each designed to trigger and evaluate specific types of term relaxation. We define 7 queries of 2 to 4 triple patterns, each designed to trigger and evaluate specific types of term relaxation.
Among existing works, Elbassuoni et al. \cite{elbassuoni2011query} used human judgment to evaluate the quality of relaxed query results. They pooled the top-10 answers from multiple approaches and asked six evaluators to rate them on a 4-point scale. Additionally, to evaluate the overall quality of relaxed queries, they generated the top-5 relaxed queries per evaluation query, ranked them by their scores, and asked the same evaluators to rate how close each relaxed query was to the original, using a 4-point scale.
\begin{comment} \begin{comment}
Elbassuoni et al. \cite{elbassuoni2011query} apply statistical language models from natural language processing and use the Jensen-Shannon divergence to probabilistically model and relax entities and predicates. Elbassuoni et al. \cite{elbassuoni2011query} apply statistical language models from natural language processing and use the Jensen-Shannon divergence to probabilistically model and relax entities and predicates.
......
...@@ -14,7 +14,7 @@ In this section, we introduce our benchmark composed of a dataset generated from ...@@ -14,7 +14,7 @@ In this section, we introduce our benchmark composed of a dataset generated from
Reification is an important aspect of our benchmark design because it provides a way for representing annotations in RDF which frequently include literals such as time, provenance, or certainty. Reification is an important aspect of our benchmark design because it provides a way for representing annotations in RDF which frequently include literals such as time, provenance, or certainty.
These annotations are typically represented using literals like strings, dates, numbers, or intervals. These annotations are typically represented using literals like strings, dates, numbers, or intervals.
Several syntaxes exist to express reification like Standard reification, n-ary relations, named graphs, and RDF-star to name a few. Several syntaxes exist to express reification like standard reification \cite{manola2004rdf}\footnote{\url{https://www.w3.org/TR/rdf-primer/\#reification}}~\footnote{\url{https://www.w3.org/TR/rdf11-schema/\#ch_reificationvocab}}, n-ary relations \cite{noy2006defining}\footnote{\url{http://www.w3.org/TR/swbp-n-aryRelations}}, named graphs \cite{carroll2005named}, and RDF-star\cite{hartig2017foundations}\footnote{\url{https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html}} to name a few.
The RDF 1.2 Draft, is considering Triple Terms where an RDF triple can be used as the object of another triple.\footnote{\url{https://www.w3.org/TR/rdf12-concepts/\#section\-triple\-terms}} The RDF 1.2 Draft, is considering Triple Terms where an RDF triple can be used as the object of another triple.\footnote{\url{https://www.w3.org/TR/rdf12-concepts/\#section\-triple\-terms}}
Considering annotations aligns with the objective of our work: evaluating a hybrid query relaxation model that considers all query elements—instances, literals, classes, and properties. Considering annotations aligns with the objective of our work: evaluating a hybrid query relaxation model that considers all query elements—instances, literals, classes, and properties.
...@@ -52,9 +52,9 @@ Our dataset is a moderate size dataset (2.5 M triples), where we introduced an i ...@@ -52,9 +52,9 @@ Our dataset is a moderate size dataset (2.5 M triples), where we introduced an i
\end{table} \end{table}
\subsection{Seven SPARQL-star query templates} \subsection{Seven new SPARQL-star query templates}
\label{sec:benchmarkQueries} \label{sec:benchmarkQueries}
We defined seven SPARQL-star queries over our dataset (Listing \ref{lst:alltriples}). We defined seven new SPARQL-star queries over our dataset (Listing \ref{lst:alltriples}).
These queries were manually designed to reflect the different types of relaxations and use cases we aim to address. These queries were manually designed to reflect the different types of relaxations and use cases we aim to address.
They cover a range of structural patterns, including star-shaped, chain-shaped, and reified patterns. They cover a range of structural patterns, including star-shaped, chain-shaped, and reified patterns.
They were selected to trigger relaxations on different types of triple-pattern terms (e.g., classes, properties, entities, literals). They were selected to trigger relaxations on different types of triple-pattern terms (e.g., classes, properties, entities, literals).
...@@ -280,9 +280,9 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$ ...@@ -280,9 +280,9 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$
\captionsetup{list=no,name=Listing} \captionsetup{list=no,name=Listing}
\begin{subfigure}{\linewidth} \begin{subfigure}{\linewidth}
\begin{lstlisting}[mathescape, language = SPARQL1.1] \begin{lstlisting}[mathescape, language = SPARQL1.1]
SELECT ?prof #contains an entity and a class subject to instance/class relaxation #contains an entity and a class subject to instance/class relaxation
WHERE { SELECT ?prof
?prof a ub:Professor; WHERE { ?prof a ub:Professor;
ub:memberOf <http://www.department403.university4.edu>. ub:memberOf <http://www.department403.university4.edu>.
} #Initial results = 15 } #Initial results = 15
...@@ -292,9 +292,9 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$ ...@@ -292,9 +292,9 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$
\end{subfigure} % Reduce space between subfigures \end{subfigure} % Reduce space between subfigures
\begin{subfigure}{\linewidth} \vspace{-8pt} \begin{subfigure}{\linewidth} \vspace{-8pt}
\begin{lstlisting}[mathescape, language = SPARQL1.1] \begin{lstlisting}[mathescape, language = SPARQL1.1]
SELECT ?student # triggers sibling class relaxation # triggers sibling class relaxation
WHERE { SELECT ?student
?student a ub:GraduateStudent; WHERE { ?student a ub:GraduateStudent;
ub:takesCourse <http://www.department0.university0.edu/graduateCourse0>; ub:takesCourse <http://www.department0.university0.edu/graduateCourse0>;
ub:undergraduateDegreeFrom <http://www.university103.edu>. ub:undergraduateDegreeFrom <http://www.university103.edu>.
} #Initial results = 1. } #Initial results = 1.
...@@ -305,9 +305,9 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$ ...@@ -305,9 +305,9 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$
\end{subfigure} \vspace{-10pt} % Reduce space between subfigures \end{subfigure} \vspace{-10pt} % Reduce space between subfigures
\begin{subfigure}{\linewidth} \vspace{-8pt} \begin{subfigure}{\linewidth} \vspace{-8pt}
\begin{lstlisting}[mathescape, language = SPARQL1.1] \begin{lstlisting}[mathescape, language = SPARQL1.1]
SELECT DISTINCT ?publication # includes a string literal relaxation # includes a string literal relaxation
WHERE { SELECT DISTINCT ?publication
?publication a ub:Publication ; WHERE { ?publication a ub:Publication ;
ub:name ?publication_name ; ub:name ?publication_name ;
ub:publicationAuthor ?z . ub:publicationAuthor ?z .
?z ub:researchInterest "Gene Expression Analysis" . ?z ub:researchInterest "Gene Expression Analysis" .
...@@ -318,9 +318,9 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$ ...@@ -318,9 +318,9 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$
\end{subfigure} \vspace{-10pt} % Reduce space between subfigures \end{subfigure} \vspace{-10pt} % Reduce space between subfigures
\begin{subfigure}{\linewidth} \begin{subfigure}{\linewidth}
\begin{lstlisting}[mathescape, language = SPARQL1.1] \begin{lstlisting}[mathescape, language = SPARQL1.1]
SELECT ?student # contains property that triggers property-level relaxation # contains property that triggers property-level relaxation
WHERE { SELECT ?student
?student ub:advisor ?Y. WHERE { ?student ub:advisor ?Y.
?Y ub:headOf <http://www.department3.0.university0.edu>. ?Y ub:headOf <http://www.department3.0.university0.edu>.
} #Initial results = 2. } #Initial results = 2.
\end{lstlisting} \vspace{-14pt} \end{lstlisting} \vspace{-14pt}
...@@ -329,9 +329,9 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$ ...@@ -329,9 +329,9 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$
\end{subfigure} \vspace{-10pt} % Reduce space between subfigures \end{subfigure} \vspace{-10pt} % Reduce space between subfigures
\begin{subfigure}{\linewidth} \begin{subfigure}{\linewidth}
\begin{lstlisting}[mathescape, language = SPARQL1.1] \begin{lstlisting}[mathescape, language = SPARQL1.1]
SELECT ?student # includes numeric literal within a reified triple # includes numeric literal within a reified triple
WHERE { SELECT ?student
?student a ub:GraduateStudent ; WHERE { ?student a ub:GraduateStudent ;
ub:takesCourse <http://www.department0.university0.edu/graduateCourse0> . ub:takesCourse <http://www.department0.university0.edu/graduateCourse0> .
<< ?x ub:undergraduateDegreeFrom ?y >> ub:yearOfAward "2010"^^xsd:integer. << ?x ub:undergraduateDegreeFrom ?y >> ub:yearOfAward "2010"^^xsd:integer.
} #Initial results = 1. } #Initial results = 1.
...@@ -340,10 +340,10 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$ ...@@ -340,10 +340,10 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$
\label{lst:q5} \label{lst:q5}
\end{subfigure} \vspace{-10pt} % Reduce space between subfigures \end{subfigure} \vspace{-10pt} % Reduce space between subfigures
\begin{subfigure}{\linewidth} \begin{subfigure}{\linewidth}
\begin{lstlisting}[mathescape, language = SPARQL1.1] \begin{lstlisting}[mathescape, language = SPARQL1.1]
SELECT DISTINCT ?student # uses string metadata in a reified triple # uses string metadata in a reified triple
WHERE { SELECT DISTINCT ?student
?student ub:name ?student_name. WHERE { ?student ub:name ?student_name.
<<?student ub:takesCourse <http://www.department0.university0.edu/undergraduateCourse1> >> <<?student ub:takesCourse <http://www.department0.university0.edu/undergraduateCourse1> >>
ub:semester "Autumn"; ub:semester "Autumn";
ub:courseYear "2024"^^xsd:integer. ub:courseYear "2024"^^xsd:integer.
...@@ -354,9 +354,9 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$ ...@@ -354,9 +354,9 @@ We consider different shapes of queries (star: $Q_{1}$ - $Q_{3}$, chain: $Q_{4}$
\end{subfigure} \vspace{-10pt} % Reduce space between subfigures \end{subfigure} \vspace{-10pt} % Reduce space between subfigures
\begin{subfigure}{\linewidth} \begin{subfigure}{\linewidth}
\begin{lstlisting}[mathescape, language = SPARQL1.1] \begin{lstlisting}[mathescape, language = SPARQL1.1]
SELECT ?student # features a numeric interval filter for range relaxation # features a numeric interval filter for range relaxation
WHERE { SELECT ?student
?student a ub:GraduateStudent; WHERE { ?student a ub:GraduateStudent;
ub:takesCourse <http://www.department0.university0.edu/graduateCourse0> . ub:takesCourse <http://www.department0.university0.edu/graduateCourse0> .
<< ?student ub:undergraduateDegreeFrom ?y >> ub:yearOfAward ?year. << ?student ub:undergraduateDegreeFrom ?y >> ub:yearOfAward ?year.
FILTER (?year >= "2008"^^xsd:integer && ?year <= "2010"^^xsd:integer ). FILTER (?year >= "2008"^^xsd:integer && ?year <= "2010"^^xsd:integer ).
......
...@@ -111,6 +111,30 @@ Fokou G, Jean S, Hadjali A. ...@@ -111,6 +111,30 @@ Fokou G, Jean S, Hadjali A.
\newblock In: International Symposium on Methodologies for Intelligent Systems. \newblock In: International Symposium on Methodologies for Intelligent Systems.
Springer; 2014. p. 512-7. Springer; 2014. p. 512-7.
\bibitem{manola2004rdf}
Manola F, Miller E, McBride B, et~al.
\newblock {RDF} primer.
\newblock {W3C} recommendation. 2004;10(1-107):6.
\bibitem{noy2006defining}
Noy N, Rector A, Hayes P, Welty C.
\newblock Defining n-ary relations on the semantic web.
\newblock W3C working group note. 2006.
\newblock Available from: \url{https://www.w3.org/TR/swbp-n-aryRelations/}.
\bibitem{carroll2005named}
Carroll JJ, Bizer C, Hayes P, Stickler P.
\newblock Named graphs.
\newblock Journal of Web Semantics (JWS). 2005;3(4):247-67.
\newblock ISSN 1570-8268.
\bibitem{hartig2017foundations}
Hartig O.
\newblock Foundations of {RDF*} and {SPARQL*}:({An} alternative approach to
statement-level metadata in {RDF}).
\newblock In: Alberto Mendelzon International Workshop on Foundations of Data
Management and the Web (AMW). vol. 1912. CEUR-WS.org; 2017. .
\bibitem{ferre2020formal} \bibitem{ferre2020formal}
Ferr{\'e} S, Huchard M, Kaytoue M, Kuznetsov SO, Napoli A. Ferr{\'e} S, Huchard M, Kaytoue M, Kuznetsov SO, Napoli A.
\newblock Formal concept analysis: from knowledge discovery to knowledge \newblock Formal concept analysis: from knowledge discovery to knowledge
......
Ce diff est replié.
Impossible d'afficher diff de source : il est trop volumineux. Options pour résoudre ce problème : voir le blob.
...@@ -351,6 +351,15 @@ ...@@ -351,6 +351,15 @@
doi={10.1007/11926078\_23} doi={10.1007/11926078\_23}
} }
@article{noy2006defining,
title={Defining n-ary relations on the semantic web},
author={Noy, Natasha and Rector, Alan and Hayes, Pat and Welty, Chris},
journal={W3C working group note},
year={2006},
publisher={World Wide Web Consortium Cambridge, MA, USA},
url ={https://www.w3.org/TR/swbp-n-aryRelations/}
}
@article{chaves2020gtfs, @article{chaves2020gtfs,
title={GTFS-Madrid-Bench: A benchmark for virtual knowledge graph access in the transport domain}, title={GTFS-Madrid-Bench: A benchmark for virtual knowledge graph access in the transport domain},
author={Chaves-Fraga, David and Priyatna, Freddy and Cimmino, Andrea and Toledo, Jhon and Ruckhaus, Edna and Corcho, Oscar}, author={Chaves-Fraga, David and Priyatna, Freddy and Cimmino, Andrea and Toledo, Jhon and Ruckhaus, Edna and Corcho, Oscar},
......