Class LetClauseSparkIterator
java.lang.Object
org.rumbledb.runtime.RuntimeTupleIterator
org.rumbledb.runtime.flwor.clauses.LetClauseSparkIterator
- All Implemented Interfaces:
com.esotericsoftware.kryo.KryoSerializable,Serializable,RuntimeTupleIteratorInterface
- See Also:
-
Field Summary
Fields inherited from class org.rumbledb.runtime.RuntimeTupleIterator
child, currentDynamicContext, evaluationDepthLimit, FLOW_EXCEPTION_MESSAGE, hasNext, inputTupleProjection, isOpen, outputTupleProjection -
Constructor Summary
ConstructorsConstructorDescriptionLetClauseSparkIterator(RuntimeTupleIterator child, Name variableName, SequenceType sequenceType, RuntimeIterator assignmentIterator, RuntimeStaticContext staticContext) -
Method Summary
Modifier and TypeMethodDescriptionstatic org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>bindLetVariableInDataFrame(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, Name newVariableName, SequenceType sequenceType, RuntimeIterator newVariableExpression, DynamicContext context, List<Name> variablesInInputTuple, Map<Name, DynamicContext.VariableDependency> outputTupleVariableDependencies, boolean hash, RumbleRuntimeConfiguration conf) Extends a DataFrame with a new column obtained from the evaluation of an expression for each tuple.voidclose()booleancontainsClause(FLWOR_CLAUSES kind) Says whether or not the clause and its descendants include a clause of the specified kind.generateNativeQuery(NativeClauseContext nativeClauseContext) This function generate (if possible) a native spark-sql query that maps the inner working of the iteratorgetDataFrame(DynamicContext context) Obtains the dataframe from the child clause.getDataFrameAsJoin(DynamicContext context, Map<Name, DynamicContext.VariableDependency> parentProjection, FlworDataFrame childDF) Variable dependencies are variables that MUST be provided by the parent clause in the dynamic context for successful execution of this clause.getInputTupleVariableDependencies(Map<Name, DynamicContext.VariableDependency> parentProjection) Builds the DataFrame projection that this clause needs to receive from its child clause.Returns the output tuple variable names.static booleanisExpressionIndependentFromInputTuple(RuntimeIterator sequenceIterator, RuntimeTupleIterator tupleIterator) booleanSays whether this expression evaluation triggers a Spark job.sparksoniq.jsoniq.tuple.FlworTuplenext()voidopen(DynamicContext context) voidprint(StringBuffer buffer, int indent) static booleanregisterLetClauseUDF(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, RuntimeIterator newVariableExpression, DynamicContext context, org.apache.spark.sql.types.StructType inputSchema, List<FlworDataFrameColumn> UDFcolumns, SequenceType sequenceType) voidreset(DynamicContext context) static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>tryNativeQuery(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, Name newVariableName, RuntimeIterator iterator, List<FlworDataFrameColumn> allColumns, org.apache.spark.sql.types.StructType inputSchema, DynamicContext context) Try to generate the native query for the let clause and run it, if successful return the resulting dataframe, otherwise it returns nullMethods inherited from class org.rumbledb.runtime.RuntimeTupleIterator
canSetEvaluationDepthLimit, getChildIterator, getConfiguration, getEvaluationDepthLimit, getHeight, getHighestExecutionMode, getMetadata, getSubtreeBeyondLimit, hasNext, isDataFrame, isOpen, read, setEvaluationDepthLimit, setInputAndOutputTupleVariableDependencies, toString, write
-
Constructor Details
-
LetClauseSparkIterator
public LetClauseSparkIterator(RuntimeTupleIterator child, Name variableName, SequenceType sequenceType, RuntimeIterator assignmentIterator, RuntimeStaticContext staticContext)
-
-
Method Details
-
open
- Specified by:
openin interfaceRuntimeTupleIteratorInterface- Overrides:
openin classRuntimeTupleIterator
-
reset
- Specified by:
resetin interfaceRuntimeTupleIteratorInterface- Overrides:
resetin classRuntimeTupleIterator
-
next
public sparksoniq.jsoniq.tuple.FlworTuple next()- Specified by:
nextin interfaceRuntimeTupleIteratorInterface- Specified by:
nextin classRuntimeTupleIterator
-
close
public void close()- Specified by:
closein interfaceRuntimeTupleIteratorInterface- Overrides:
closein classRuntimeTupleIterator
-
getDataFrame
Description copied from class:RuntimeTupleIteratorObtains the dataframe from the child clause. It is possible, with the second parameter, to specify the variables it needs to project the others away, or that only a count is needed for a specific variable, which allows projecting away the actual items.- Specified by:
getDataFramein classRuntimeTupleIterator- Parameters:
context- the dynamic context in which the evaluate the child clause's dataframe.- Returns:
- the DataFrame with the tuples returned by the child clause.
-
getDataFrameAsJoin
public FlworDataFrame getDataFrameAsJoin(DynamicContext context, Map<Name, DynamicContext.VariableDependency> parentProjection, FlworDataFrame childDF) -
isExpressionIndependentFromInputTuple
public static boolean isExpressionIndependentFromInputTuple(RuntimeIterator sequenceIterator, RuntimeTupleIterator tupleIterator) -
getDynamicContextVariableDependencies
Description copied from class:RuntimeTupleIteratorVariable dependencies are variables that MUST be provided by the parent clause in the dynamic context for successful execution of this clause. These variables are: 1. All variables that the expression of the clause depends on (recursive call of getVariableDependencies on the expression) 2. Except those variables bound in the current FLWOR (obtained from the auxiliary method getVariablesBoundInCurrentFLWORExpression), because those are provided in the Tuples 3. Plus (recursively calling getVariableDependencies) all the Variable Dependencies of the child clause if it exists.- Overrides:
getDynamicContextVariableDependenciesin classRuntimeTupleIterator- Returns:
- a map of variable names to dependencies (FULL, COUNT, ...) that this clause needs to obtain from the dynamic context.
-
getOutputTupleVariableNames
Description copied from class:RuntimeTupleIteratorReturns the output tuple variable names. These variables can be removed from the dependencies of expressions in ascendent (subsequent) clauses, because their values are provided in the tuples rather than the dynamic context object.- Overrides:
getOutputTupleVariableNamesin classRuntimeTupleIterator- Returns:
- the set of variable names that are bound by descendant clauses.
-
print
- Overrides:
printin classRuntimeTupleIterator
-
getInputTupleVariableDependencies
public Map<Name,DynamicContext.VariableDependency> getInputTupleVariableDependencies(Map<Name, DynamicContext.VariableDependency> parentProjection) Description copied from class:RuntimeTupleIteratorBuilds the DataFrame projection that this clause needs to receive from its child clause. The intent is that the result of this method is forwarded to the child clause in getDataFrame() so it can optimize some values away. Invariant: all keys in getInputTupleVariableDependencies(...) MUST be output tuple variables, i.e., appear in this.child.getOutputTupleVariableNames()- Specified by:
getInputTupleVariableDependenciesin classRuntimeTupleIterator- Parameters:
parentProjection- the projection needed by the parent clause.- Returns:
- the projection needed by this clause.
-
bindLetVariableInDataFrame
public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> bindLetVariableInDataFrame(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, Name newVariableName, SequenceType sequenceType, RuntimeIterator newVariableExpression, DynamicContext context, List<Name> variablesInInputTuple, Map<Name, DynamicContext.VariableDependency> outputTupleVariableDependencies, boolean hash, RumbleRuntimeConfiguration conf) Extends a DataFrame with a new column obtained from the evaluation of an expression for each tuple.- Parameters:
dataFrame- the DataFrame to extendnewVariableName- the name of the new column (variable)sequenceType- the sequence type of the new bound item, not used in case of hashnewVariableExpression- the expression to evaluatecontext- the context (in addition to each tuple) in which to evaluation the expressionvariablesInInputTuple- the name of the variables that can be found in the input tuple (as opposed to those in the context)outputTupleVariableDependencies- the dependencies to project to (possibly null to keep everything).hash- whether or not to compute single-item hashes rather than the actual serialized sequences of items.- Returns:
- the DataFrame with the new column
-
registerLetClauseUDF
public static boolean registerLetClauseUDF(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, RuntimeIterator newVariableExpression, DynamicContext context, org.apache.spark.sql.types.StructType inputSchema, List<FlworDataFrameColumn> UDFcolumns, SequenceType sequenceType) -
tryNativeQuery
public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> tryNativeQuery(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, Name newVariableName, RuntimeIterator iterator, List<FlworDataFrameColumn> allColumns, org.apache.spark.sql.types.StructType inputSchema, DynamicContext context) Try to generate the native query for the let clause and run it, if successful return the resulting dataframe, otherwise it returns null- Parameters:
dataFrame- input dataframe for the querynewVariableName- name of the new bound variableiterator- let variable assignment expression iteratorallColumns- other columns required in following clausesinputSchema- input schema of the dataframecontext- current dynamic context of the dataframe- Returns:
- resulting dataframe of the let clause if successful, null otherwise
-
containsClause
Description copied from class:RuntimeTupleIteratorSays whether or not the clause and its descendants include a clause of the specified kind.- Specified by:
containsClausein classRuntimeTupleIterator- Parameters:
kind- the kind of clause to test for.- Returns:
- true if there is one. False otherwise.
-
isSparkJobNeeded
public boolean isSparkJobNeeded()Says whether this expression evaluation triggers a Spark job.- Specified by:
isSparkJobNeededin classRuntimeTupleIterator- Returns:
- true if the execution triggers a Spark, false otherwise, null if undetermined yet.
-
generateNativeQuery
Description copied from class:RuntimeTupleIteratorThis function generate (if possible) a native spark-sql query that maps the inner working of the iterator- Overrides:
generateNativeQueryin classRuntimeTupleIterator- Parameters:
nativeClauseContext- context information to generate the native query- Returns:
- a native clause context with the spark-sql native query to get an equivalent result of the iterator, or [NativeClauseContext.NoNativeQuery] if it is not possible
-