ARQ - Extending Query Execution
jena通过实现Graph 接口可以扩展到使用新的存储或者访问non-rdf数据。重写graph的find就可以实现在只读数据库上的访问。
arq query processing一共有6个步骤:parsing, algebra generation, execution building, high-level optimization, low-level optimization and finally evaluation.
query string -> Query object. Query类用抽象语法树AST表示一个query并且提供了创建AST的方法。
Algebra generation
将query obj转化成SPARQL algebra expression
High-Level Optimization and Transformations
对algebra op做一系列的transformation。用户可以自己扩展这些transfermation
transformation 会自底向上的处理algebra。
Low-Level Optimization and Evaluation
Low-level Optimization 的职责之一是选择query的执行顺序。
Query Engines and Query Engine Factories
query engine是一种自顶向下的执行模式。当query execution factory得到一个dataset 和query时,他会通过accept函数去看那些registered engine factory可以执行这个query。当选定一个queryEngine时,通过create方法来创在一个plan对象.通过plan就可以得到query的queryIterator
Main Query Engine
Main query engine可以执行任何query。当初始化完成时,它调用QC.execute来执行一个query。任何扩展如何向要重用main query engine则需要用自己的Opexecutor
调用这个QC.execute来执行sub-query。QC.execute会生成一个OpExecutor object并且用它执行一个algebra operation
扩展main Query engine有两种方法:
- Stage generators, 用来执行basic graph patterns 并且重用engine
- OpExecutor来执行特定的operator。
Stage generator
StageGenerator的优点在于相比OpExecutor需要了解较少的细节。只需要在context中设置set(ARQ.stageGenerator, stageGenAlt);
,他会调用StageGenerator.execute(BasicPattern pattern, QueryIterator input, ExecutionContext execCxt)
- 继承已有的OpExecutor,并实现特殊的QueryIterator executor()。
- 在QC或者ExecutionContext中中注册一个OpExecutorFactor
Dataset, model, graph的区别
Jena is divided into an API, for application developers, and an SPI for systems developers, such as people making storage engines, reasoners etc.
DataSet, Model, Statement, Resource and Literal are API interfaces and provide many conveniences for application developers.
DataSetGraph, Graph, Triple, Node are SPI interfaces. They’re pretty spartan and simple to implement (as you’d hope if you’ve got to implement the things).
- A DataSource is a collection of models (one being the Default Model, any others being Named Models) that you expect will have new triples added to it over time. You can read and write on DataSources.
- A Dataset is like a DataSource, but its triples are static - you don’t expect new ones to be added or existing ones to be deleted. These guys are read-only.
- A Model is a collection of statements- this is what you typically aim your SPARQL queries at. If you SPARQL a DataSource or Dataset and don’t use a ‘FROM NAMED’ clause, you’re querying the Default Model.
- A Graph is a collection of triples. Every Model can be turned into a Graph, to provide a somewhat closer representation of the RDF, OWL, and SPARQL standards.
- A DatasetGraph is a container for Graphs, similar to a DataSource (i.e read/write), that provides the infrastructure for Default and Named Graphs