java.lang.Object

org.rumbledb.runtime.RuntimeTupleIterator

org.rumbledb.runtime.flwor.clauses.LetClauseSparkIterator

All Implemented Interfaces:: com.esotericsoftware.kryo.KryoSerializable, Serializable, RuntimeTupleIteratorInterface

public class LetClauseSparkIterator extends RuntimeTupleIterator

See Also:

Serialized Form

Field Summary

Fields inherited from class org.rumbledb.runtime.RuntimeTupleIterator
child, currentDynamicContext, evaluationDepthLimit, FLOW_EXCEPTION_MESSAGE, hasNext, inputTupleProjection, isOpen, outputTupleProjection
Constructor Summary

Constructors

Constructor

Description

LetClauseSparkIterator(RuntimeTupleIterator child, Name variableName, SequenceType sequenceType, RuntimeIterator assignmentIterator, RuntimeStaticContext staticContext)
Method Summary

Modifier and Type

Method

Description

static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>

bindLetVariableInDataFrame(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, Name newVariableName, SequenceType sequenceType, RuntimeIterator newVariableExpression, DynamicContext context, List<Name> variablesInInputTuple, Map<Name,DynamicContext.VariableDependency> outputTupleVariableDependencies, boolean hash, RumbleRuntimeConfiguration conf)

Extends a DataFrame with a new column obtained from the evaluation of an expression for each tuple.

void

close()

boolean

containsClause(FLWOR_CLAUSES kind)

Says whether or not the clause and its descendants include a clause of the specified kind.

NativeClauseContext

generateNativeQuery(NativeClauseContext nativeClauseContext)

This function generate (if possible) a native spark-sql query that maps the inner working of the iterator

FlworDataFrame

getDataFrame(DynamicContext context)

Obtains the dataframe from the child clause.

FlworDataFrame

getDataFrameAsJoin(DynamicContext context, Map<Name,DynamicContext.VariableDependency> parentProjection, FlworDataFrame childDF)

Map<Name,DynamicContext.VariableDependency>

getDynamicContextVariableDependencies()

Variable dependencies are variables that MUST be provided by the parent clause in the dynamic context for successful execution of this clause.

Map<Name,DynamicContext.VariableDependency>

getInputTupleVariableDependencies(Map<Name,DynamicContext.VariableDependency> parentProjection)

Builds the DataFrame projection that this clause needs to receive from its child clause.

Set<Name>

getOutputTupleVariableNames()

Returns the output tuple variable names.

static boolean

isExpressionIndependentFromInputTuple(RuntimeIterator sequenceIterator, RuntimeTupleIterator tupleIterator)

boolean

isSparkJobNeeded()

Says whether this expression evaluation triggers a Spark job.

sparksoniq.jsoniq.tuple.FlworTuple

next()

void

open(DynamicContext context)

void

print(StringBuffer buffer, int indent)

static boolean

registerLetClauseUDF(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, RuntimeIterator newVariableExpression, DynamicContext context, org.apache.spark.sql.types.StructType inputSchema, List<FlworDataFrameColumn> UDFcolumns, SequenceType sequenceType)

void

reset(DynamicContext context)

static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>

tryNativeQuery(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, Name newVariableName, RuntimeIterator iterator, List<FlworDataFrameColumn> allColumns, org.apache.spark.sql.types.StructType inputSchema, DynamicContext context)

Try to generate the native query for the let clause and run it, if successful return the resulting dataframe, otherwise it returns null

Methods inherited from class org.rumbledb.runtime.RuntimeTupleIterator
canSetEvaluationDepthLimit, getChildIterator, getConfiguration, getEvaluationDepthLimit, getHeight, getHighestExecutionMode, getMetadata, getSubtreeBeyondLimit, hasNext, isDataFrame, isOpen, read, setEvaluationDepthLimit, setInputAndOutputTupleVariableDependencies, toString, write

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Details
- LetClauseSparkIterator
  
  public LetClauseSparkIterator(RuntimeTupleIterator child, Name variableName, SequenceType sequenceType, RuntimeIterator assignmentIterator, RuntimeStaticContext staticContext)
Method Details
- open
  
  public void open(DynamicContext context)
  
  Specified by:
  
  open in interface RuntimeTupleIteratorInterface
  
  Overrides:
  
  open in class RuntimeTupleIterator
- reset
  
  public void reset(DynamicContext context)
  
  Specified by:
  
  reset in interface RuntimeTupleIteratorInterface
  
  Overrides:
  
  reset in class RuntimeTupleIterator
- next
  
  public sparksoniq.jsoniq.tuple.FlworTuple next()
  
  Specified by:
  
  next in interface RuntimeTupleIteratorInterface
  
  Specified by:
  
  next in class RuntimeTupleIterator
- close
  
  public void close()
  
  Specified by:
  
  close in interface RuntimeTupleIteratorInterface
  
  Overrides:
  
  close in class RuntimeTupleIterator
- getDataFrame
  
  public FlworDataFrame getDataFrame(DynamicContext context)
  
  Description copied from class: RuntimeTupleIterator
  
  Obtains the dataframe from the child clause. It is possible, with the second parameter, to specify the variables it needs to project the others away, or that only a count is needed for a specific variable, which allows projecting away the actual items.
  
  Specified by:
  
  getDataFrame in class RuntimeTupleIterator
  
  Parameters:
  
  context - the dynamic context in which the evaluate the child clause's dataframe.
  
  Returns:
  
  the DataFrame with the tuples returned by the child clause.
- getDataFrameAsJoin
  
  public FlworDataFrame getDataFrameAsJoin(DynamicContext context, Map<Name,DynamicContext.VariableDependency> parentProjection, FlworDataFrame childDF)
- isExpressionIndependentFromInputTuple
  
  public static boolean isExpressionIndependentFromInputTuple(RuntimeIterator sequenceIterator, RuntimeTupleIterator tupleIterator)
- getDynamicContextVariableDependencies
  
  public Map<Name,DynamicContext.VariableDependency> getDynamicContextVariableDependencies()
  
  Description copied from class: RuntimeTupleIterator
  
  Variable dependencies are variables that MUST be provided by the parent clause in the dynamic context for successful execution of this clause. These variables are: 1. All variables that the expression of the clause depends on (recursive call of getVariableDependencies on the expression) 2. Except those variables bound in the current FLWOR (obtained from the auxiliary method getVariablesBoundInCurrentFLWORExpression), because those are provided in the Tuples 3. Plus (recursively calling getVariableDependencies) all the Variable Dependencies of the child clause if it exists.
  
  Overrides:
  
  getDynamicContextVariableDependencies in class RuntimeTupleIterator
  
  Returns:
  
  a map of variable names to dependencies (FULL, COUNT, ...) that this clause needs to obtain from the dynamic context.
- getOutputTupleVariableNames
  
  public Set<Name> getOutputTupleVariableNames()
  
  Description copied from class: RuntimeTupleIterator
  
  Returns the output tuple variable names. These variables can be removed from the dependencies of expressions in ascendent (subsequent) clauses, because their values are provided in the tuples rather than the dynamic context object.
  
  Overrides:
  
  getOutputTupleVariableNames in class RuntimeTupleIterator
  
  Returns:
  
  the set of variable names that are bound by descendant clauses.
- print
  
  public void print(StringBuffer buffer, int indent)
  
  Overrides:
  
  print in class RuntimeTupleIterator
- getInputTupleVariableDependencies
  
  public Map<Name,DynamicContext.VariableDependency> getInputTupleVariableDependencies(Map<Name,DynamicContext.VariableDependency> parentProjection)
  
  Description copied from class: RuntimeTupleIterator
  
  Builds the DataFrame projection that this clause needs to receive from its child clause. The intent is that the result of this method is forwarded to the child clause in getDataFrame() so it can optimize some values away. Invariant: all keys in getInputTupleVariableDependencies(...) MUST be output tuple variables, i.e., appear in this.child.getOutputTupleVariableNames()
  
  Specified by:
  
  getInputTupleVariableDependencies in class RuntimeTupleIterator
  
  Parameters:
  
  parentProjection - the projection needed by the parent clause.
  
  Returns:
  
  the projection needed by this clause.
- bindLetVariableInDataFrame
  
  public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> bindLetVariableInDataFrame(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, Name newVariableName, SequenceType sequenceType, RuntimeIterator newVariableExpression, DynamicContext context, List<Name> variablesInInputTuple, Map<Name,DynamicContext.VariableDependency> outputTupleVariableDependencies, boolean hash, RumbleRuntimeConfiguration conf)
  
  Extends a DataFrame with a new column obtained from the evaluation of an expression for each tuple.
  
  Parameters:
  
  dataFrame - the DataFrame to extend
  
  newVariableName - the name of the new column (variable)
  
  sequenceType - the sequence type of the new bound item, not used in case of hash
  
  newVariableExpression - the expression to evaluate
  
  context - the context (in addition to each tuple) in which to evaluation the expression
  
  variablesInInputTuple - the name of the variables that can be found in the input tuple (as opposed to those in the context)
  
  outputTupleVariableDependencies - the dependencies to project to (possibly null to keep everything).
  
  hash - whether or not to compute single-item hashes rather than the actual serialized sequences of items.
  
  Returns:
  
  the DataFrame with the new column
- registerLetClauseUDF
  
  public static boolean registerLetClauseUDF(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, RuntimeIterator newVariableExpression, DynamicContext context, org.apache.spark.sql.types.StructType inputSchema, List<FlworDataFrameColumn> UDFcolumns, SequenceType sequenceType)
- tryNativeQuery
  
  public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> tryNativeQuery(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame, Name newVariableName, RuntimeIterator iterator, List<FlworDataFrameColumn> allColumns, org.apache.spark.sql.types.StructType inputSchema, DynamicContext context)
  
  Try to generate the native query for the let clause and run it, if successful return the resulting dataframe, otherwise it returns null
  
  Parameters:
  
  dataFrame - input dataframe for the query
  
  newVariableName - name of the new bound variable
  
  iterator - let variable assignment expression iterator
  
  allColumns - other columns required in following clauses
  
  inputSchema - input schema of the dataframe
  
  context - current dynamic context of the dataframe
  
  Returns:
  
  resulting dataframe of the let clause if successful, null otherwise
- containsClause
  
  public boolean containsClause(FLWOR_CLAUSES kind)
  
  Description copied from class: RuntimeTupleIterator
  
  Says whether or not the clause and its descendants include a clause of the specified kind.
  
  Specified by:
  
  containsClause in class RuntimeTupleIterator
  
  Parameters:
  
  kind - the kind of clause to test for.
  
  Returns:
  
  true if there is one. False otherwise.
- isSparkJobNeeded
  
  public boolean isSparkJobNeeded()
  
  Says whether this expression evaluation triggers a Spark job.
  
  Specified by:
  
  isSparkJobNeeded in class RuntimeTupleIterator
  
  Returns:
  
  true if the execution triggers a Spark, false otherwise, null if undetermined yet.
- generateNativeQuery
  
  public NativeClauseContext generateNativeQuery(NativeClauseContext nativeClauseContext)
  
  Description copied from class: RuntimeTupleIterator
  
  This function generate (if possible) a native spark-sql query that maps the inner working of the iterator
  
  Overrides:
  
  generateNativeQuery in class RuntimeTupleIterator
  
  Parameters:
  
  nativeClauseContext - context information to generate the native query
  
  Returns:
  
  a native clause context with the spark-sql native query to get an equivalent result of the iterator, or [NativeClauseContext.NoNativeQuery] if it is not possible

Class LetClauseSparkIterator

Field Summary

Fields inherited from class org.rumbledb.runtime.RuntimeTupleIterator

Constructor Summary

Method Summary

Methods inherited from class org.rumbledb.runtime.RuntimeTupleIterator

Methods inherited from class java.lang.Object

Constructor Details

LetClauseSparkIterator

Method Details

open

reset

next

close

getDataFrame

getDataFrameAsJoin

isExpressionIndependentFromInputTuple

getDynamicContextVariableDependencies

getOutputTupleVariableNames

print

getInputTupleVariableDependencies

bindLetVariableInDataFrame

registerLetClauseUDF

tryNativeQuery

containsClause

isSparkJobNeeded

generateNativeQuery