Lit Protocol
Lit is a decentralized key management and compute network. Builders of apps, wallets, protocols, and AI agents use Lit to advance digital ownership with decentralized keys and private, immutable programs.
developer.litprotocol.com/wh…
Spark is a public space for collaboration and open discourse surrounding the development of the decentralized, user-owned web, AKA 'Web3'. Spark was created by Lit Protocol, a distributed key management network for encryption, signing, and compute.
Spark ecosystem
spark.litprotocol.com/ecosys…
SELLING ACCESS TO YOUR NODES
1) SparkPoint Network Node Key Sale
SparkPoint Network Node Key Sale Whitepaper
The Node Key Sale is an opportunity for individuals to acquire Node Keys, granting them the privilege to operate nodes within the network.
Owning a Node Key grants you not only the ability to participate in securing the network but also unlocks a range of valuable benefits within the SparkPoint ecosystem.
medium.com/@mativanarquero/s…
Spark Node
Introducing SPARK, originally developed on the SOLANA chain for speed, efficiency, and priority transactions. Now a multi-chain solution, SPARK offers a range of services for navigating the web3 landscape. Discover how SPARK can empower your journey in the decentralized world.
sparknode.xyz/
1) Sparknode
Sparknode is built to make it easier for your server-side (or node-webkit) code to communicate with the spark cloud, so that you can do more with your core with less overhead.
github.com/andrewstuart/Spar…
2) Spark Cloud (Particle Cloud)
The Particle Device Cloud API is a REST API. REST means a lot of things, but first and foremost it means that we use the URL in the way that it's intended: as a "Uniform Resource Locator".
docs.particle.io/reference/c…
Spark Cloud API Documentation
docs.spark.io/#/api
pyspark - How is data accessed by worker nodes in a Spark Cluster?
Each task would decide which data is "local" and which is not. This article explains pretty well how data locality works
I have two worker nodes with 4 cores each and I have one 1TB csv file to read and perform a few transformations and an action
This situation is different with the above question, where you have only one file and most likely your worker would be exactly the same as your data node. The executor(s) those are sitting on that worker, however, would read the file piece by piece (by tasks), in parallel, in order to increase parallelism.
Typically a Spark cluster contains multiple nodes, each node would have multiple CPUs, a bunch of memory, and storage. Each node would hold some chunks of data, therefore sometimes they're also referred to data nodes as well.
When Spark application(s) are started, they tend to create multiple workers or executors. Those workers/executors took resources (CPU, RAM) from the cluster's nodes above. In other words, the nodes in a Spark cluster play both roles: data storage and computation.
But as you might have guessed, data in a node (sometimes) is incomplete, therefore, workers would have to "pull" data across the network to do a partial computation.
Then the results are sent back to the driver. The driver would just do the "collection work", and combine them all to get the final results.
stackoverflow.com/questions/…
Configuring Spark Connections
🔸Multi-node Spark Cluster
spark.posit.co/guides/connec…
Configuring Spark Nodes
docs.datastax.com/en/dse/6.9…
Spark Cluster with Docker & docker-compose (Kubernetes)
A simple spark standalone cluster for your testing environment purposes
github.com/mvillarrealb/dock…
Apache Spark
spark.apache.org/
A unified analytics engine for large-scale data processing
github.com/apache/spark
Spark Final Project
Spark Simple PoC
github.com/marioscience/spar…
Data Analysis using Spark
github.com/pregismond/data-a…
ENGR 440 - Report of Distributed Streaming Project
github.com/zoltan-nz/kafka-s…
Credit goes to: @SKettenbei73754,
@SherlockHghost, and
@Ryansikorski10 ❤️🙏