Class ForClauseSparkIterator
java.lang.Object
org.rumbledb.runtime.RuntimeTupleIterator
org.rumbledb.runtime.flwor.clauses.ForClauseSparkIterator
- All Implemented Interfaces:
com.esotericsoftware.kryo.KryoSerializable
,Serializable
,RuntimeTupleIteratorInterface
- See Also:
-
Field Summary
Fields inherited from class org.rumbledb.runtime.RuntimeTupleIterator
child, currentDynamicContext, evaluationDepthLimit, FLOW_EXCEPTION_MESSAGE, hasNext, inputTupleProjection, isOpen, outputTupleProjection
-
Constructor Summary
ConstructorsConstructorDescriptionForClauseSparkIterator
(RuntimeTupleIterator child, Name variableName, Name positionalVariableName, boolean allowingEmpty, RuntimeIterator assignmentIterator, RuntimeStaticContext staticContext) -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
boolean
containsClause
(FLWOR_CLAUSES kind) Says whether or not the clause and its descendants include a clause of the specified kind.generateNativeQuery
(NativeClauseContext nativeClauseContext) This function generate (if possible) a native spark-sql query that maps the inner working of the iteratorgetDataFrame
(DynamicContext context) Obtains the dataframe from the child clause.static FlworDataFrame
getDataFrameStartingClause
(RuntimeIterator iterator, Name variableName, Name positionalVariableName, boolean allowingEmpty, DynamicContext context, Map<Name, DynamicContext.VariableDependency> outputDependencies) Starting clause and the expression is parallelizable.Variable dependencies are variables that MUST be provided by the parent clause in the dynamic context for successful execution of this clause.getInputTupleVariableDependencies
(Map<Name, DynamicContext.VariableDependency> parentProjection) Builds the DataFrame projection that this clause needs to receive from its child clause.Returns the output tuple variable names.boolean
boolean
Says whether this expression evaluation triggers a Spark job.sparksoniq.jsoniq.tuple.FlworTuple
next()
void
open
(DynamicContext context) void
print
(StringBuffer buffer, int indent) static void
registerForClauseUDF
(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, RuntimeIterator newVariableExpression, DynamicContext context, org.apache.spark.sql.types.StructType inputSchema, List<FlworDataFrameColumn> UDFcolumns, SequenceType sequenceType) void
reset
(DynamicContext context) static FlworDataFrame
tryNativeQuery
(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, Name newVariableName, Name positionalVariableName, boolean allowingEmpty, RuntimeIterator iterator, List<FlworDataFrameColumn> allColumns, org.apache.spark.sql.types.StructType inputSchema, DynamicContext context) Try to generate the native query for the for clause and run it, if successful return the resulting dataframe, otherwise it returns nullMethods inherited from class org.rumbledb.runtime.RuntimeTupleIterator
canSetEvaluationDepthLimit, getChildIterator, getConfiguration, getEvaluationDepthLimit, getHeight, getHighestExecutionMode, getMetadata, getSubtreeBeyondLimit, hasNext, isDataFrame, isOpen, read, setEvaluationDepthLimit, setInputAndOutputTupleVariableDependencies, toString, write
-
Constructor Details
-
ForClauseSparkIterator
public ForClauseSparkIterator(RuntimeTupleIterator child, Name variableName, Name positionalVariableName, boolean allowingEmpty, RuntimeIterator assignmentIterator, RuntimeStaticContext staticContext)
-
-
Method Details
-
getVariableName
-
getPositionalVariableName
-
getAssignmentIterator
-
isAllowingEmpty
public boolean isAllowingEmpty() -
open
- Specified by:
open
in interfaceRuntimeTupleIteratorInterface
- Overrides:
open
in classRuntimeTupleIterator
-
reset
- Specified by:
reset
in interfaceRuntimeTupleIteratorInterface
- Overrides:
reset
in classRuntimeTupleIterator
-
next
public sparksoniq.jsoniq.tuple.FlworTuple next()- Specified by:
next
in interfaceRuntimeTupleIteratorInterface
- Specified by:
next
in classRuntimeTupleIterator
-
close
public void close()- Specified by:
close
in interfaceRuntimeTupleIteratorInterface
- Overrides:
close
in classRuntimeTupleIterator
-
getDataFrame
Description copied from class:RuntimeTupleIterator
Obtains the dataframe from the child clause. It is possible, with the second parameter, to specify the variables it needs to project the others away, or that only a count is needed for a specific variable, which allows projecting away the actual items.- Specified by:
getDataFrame
in classRuntimeTupleIterator
- Parameters:
context
- the dynamic context in which the evaluate the child clause's dataframe.- Returns:
- the DataFrame with the tuples returned by the child clause.
-
getDataFrameStartingClause
public static FlworDataFrame getDataFrameStartingClause(RuntimeIterator iterator, Name variableName, Name positionalVariableName, boolean allowingEmpty, DynamicContext context, Map<Name, DynamicContext.VariableDependency> outputDependencies) Starting clause and the expression is parallelizable.- Parameters:
iterator
- the expression iteratorvariableName
- the name of the for variablepositionalVariableName
- the name of the positional variable (or null if none)allowingEmpty
- whether the allowing empty option is presentcontext
- the dynamic context.outputDependencies
- the desired project.- Returns:
- the resulting DataFrame.
-
getDynamicContextVariableDependencies
Description copied from class:RuntimeTupleIterator
Variable dependencies are variables that MUST be provided by the parent clause in the dynamic context for successful execution of this clause. These variables are: 1. All variables that the expression of the clause depends on (recursive call of getVariableDependencies on the expression) 2. Except those variables bound in the current FLWOR (obtained from the auxiliary method getVariablesBoundInCurrentFLWORExpression), because those are provided in the Tuples 3. Plus (recursively calling getVariableDependencies) all the Variable Dependencies of the child clause if it exists.- Overrides:
getDynamicContextVariableDependencies
in classRuntimeTupleIterator
- Returns:
- a map of variable names to dependencies (FULL, COUNT, ...) that this clause needs to obtain from the dynamic context.
-
getOutputTupleVariableNames
Description copied from class:RuntimeTupleIterator
Returns the output tuple variable names. These variables can be removed from the dependencies of expressions in ascendent (subsequent) clauses, because their values are provided in the tuples rather than the dynamic context object.- Overrides:
getOutputTupleVariableNames
in classRuntimeTupleIterator
- Returns:
- the set of variable names that are bound by descendant clauses.
-
print
- Overrides:
print
in classRuntimeTupleIterator
-
getInputTupleVariableDependencies
public Map<Name,DynamicContext.VariableDependency> getInputTupleVariableDependencies(Map<Name, DynamicContext.VariableDependency> parentProjection) Description copied from class:RuntimeTupleIterator
Builds the DataFrame projection that this clause needs to receive from its child clause. The intent is that the result of this method is forwarded to the child clause in getDataFrame() so it can optimize some values away. Invariant: all keys in getInputTupleVariableDependencies(...) MUST be output tuple variables, i.e., appear in this.child.getOutputTupleVariableNames()- Specified by:
getInputTupleVariableDependencies
in classRuntimeTupleIterator
- Parameters:
parentProjection
- the projection needed by the parent clause.- Returns:
- the projection needed by this clause.
-
tryNativeQuery
public static FlworDataFrame tryNativeQuery(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, Name newVariableName, Name positionalVariableName, boolean allowingEmpty, RuntimeIterator iterator, List<FlworDataFrameColumn> allColumns, org.apache.spark.sql.types.StructType inputSchema, DynamicContext context) Try to generate the native query for the for clause and run it, if successful return the resulting dataframe, otherwise it returns null- Parameters:
dataFrame
- input dataframe for the querynewVariableName
- name of the new bound variablepositionalVariableName
- name of the positional variable (or null if absent)allowingEmpty
- boolean signaling allowing empty flag in expressioniterator
- for variable assignment expression iteratorallColumns
- other columns required in following clausesinputSchema
- input schema of the dataframecontext
- current dynamic context of the dataframe- Returns:
- resulting dataframe of the for clause if successful, null otherwise
-
containsClause
Description copied from class:RuntimeTupleIterator
Says whether or not the clause and its descendants include a clause of the specified kind.- Specified by:
containsClause
in classRuntimeTupleIterator
- Parameters:
kind
- the kind of clause to test for.- Returns:
- true if there is one. False otherwise.
-
registerForClauseUDF
public static void registerForClauseUDF(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, RuntimeIterator newVariableExpression, DynamicContext context, org.apache.spark.sql.types.StructType inputSchema, List<FlworDataFrameColumn> UDFcolumns, SequenceType sequenceType) -
isSparkJobNeeded
public boolean isSparkJobNeeded()Says whether this expression evaluation triggers a Spark job.- Specified by:
isSparkJobNeeded
in classRuntimeTupleIterator
- Returns:
- true if the execution triggers a Spark, false otherwise, null if undetermined yet.
-
generateNativeQuery
This function generate (if possible) a native spark-sql query that maps the inner working of the iterator- Overrides:
generateNativeQuery
in classRuntimeTupleIterator
- Parameters:
nativeClauseContext
- context information to generate the native query- Returns:
- a native clause context with the spark-sql native query to get an equivalent result of the iterator, or [NativeClauseContext.NoNativeQuery] if it is not possible
-