Would having two different systems be useful for Distributed Systems?

You don't really need multiple physical systems to do "distributed systems" depending on what is the software you are using you can run everything on the same machine and not even need to spin up VMs or containers.

What is/are the software/apps you plan to use?
 
Yes I realize I can just use docker.

I planned on implementing paxos/ raft / rpc / fault tolerance. I guess I can do that on the same machine itself
If you want to implement those, look into DSLabs.

It's an open source Distributed Programming simulator and model checker with various labs of increasing difficulties. The second to last lab is implementing Paxos (can also do Raft), while the last lab is a sharded KV store system (~ Google Spanner).

It's in Java and can be run on a single machine.
 
If you want to implement those, look into DSLabs.

It's an open source Distributed Programming simulator and model checker with various labs of increasing difficulties. The second to last lab is implementing Paxos (can also do Raft), while the last lab is a sharded KV store system (~ Google Spanner).

It's in Java and can be run on a single machine.
great, i will look into it, thanks.

Any simulator in GO that you are aware of? I am not that great at Java
 
great, i will look into it, thanks.

Any simulator in GO that you are aware of? I am not that great at Java
Unfortunately no.

Java isn't difficult to pick up at all if you're familiar with any OOP language.

And the Java in DSLabs is fairly straightforward (just look up how to make deep copies of objects, that's as advanced as it'll go).
 
Here's my attempt at kinda hijacking this thread...

Is there any tool that can simulate flaky internet connections (slow speed, latency spikes, packet drops etc) between nodes of a distributed system? Say some tool or configuration (maybe with docker) to achieve this, to test reliability of your distributed product? Like you have this chaos-monkey kind of tool to randomly kill nodes of your DS to test it's HA/recovery capabilities.
Yes I realize I can just use docker.

I was actually trying to say you MAY NOT need to use docker, just run separate processes-as-nodes in the simulated distributed system :-) then communicate via regular TCP/UDP/whatever.
 
I'm not sure about any tool but you have load testing tools, like jmeter for java which can stimulate thousands of users trying to access your system (api or endpoint) simultaneously. Maybe that can help?
 
  • Like
Reactions: vishalrao
Is there any tool that can simulate flaky internet connections (slow speed, latency spikes, packet drops etc) between nodes of a distributed system? Say some tool or configuration (maybe with docker) to achieve this, to test reliability of your distributed product? Like you have this chaos-monkey kind of tool to randomly kill nodes of your DS to test it's HA/recovery capabilities.
These are some tools I bookmarked, never used myself though.

Toxiproxy: https://github.com/Shopify/toxiproxy
Jepsen and chaosmonkey do fault injection on network and/or processes.
speedbump: https://github.com/kffl/speedbump
Clumsy: https://jagt.github.io/clumsy/
 
I was actually trying to say you MAY NOT need to use docker, just run separate processes-as-nodes in the simulated distributed system :-) then communicate via regular TCP/UDP/whatever.
The easiest way I know to do Distributed Systems (systems, not programming) simulation and testing is to use Kubernetes and KIND (or miniKube, whichever you prefer).

It simulates Kubernetes on a single machine using docker containers as nodes. Not as lightweight as maybe using processes as nodes, but an order of magnitude easier to set up and run.
Another advantage with this setup is that you can use existing K8s tooling for all sorts of tests and deployments.
 
  • Like
Reactions: vishalrao
The easiest way I know to do Distributed Systems (systems, not programming) simulation and testing is to use Kubernetes and KIND (or miniKube, whichever you prefer).

It simulates Kubernetes on a single machine using docker containers as nodes. Not as lightweight as maybe using processes as nodes, but an order of magnitude easier to set up and run.
Another advantage with this setup is that you can use existing K8s tooling for all sorts of tests and deployments.
Cool....

What is the difference between using KIND and native docker images as processes?
 
KIND simulates multi-node Kubernetes on a single machine.
Without it, your k8s setup would be a single node cluster, which isn't that representative of an actual deployment, which should have multiple nodes for failover and HA.

Without kind, you'd need to install K8s on multiple machines to simulate the same behavior.

You can probably use Docker containers to simulate this, but that would require custom orchestration. KIND/minikube just do it for you, using k8s as the orchestrator.
 
Another question - are there any sample apps / datasets you can run after you set up a distributed system like k8s minikube on your computer?

Like some database or compute (Apache Spark comes to mind) but including some ready configuration and data to just setup and hit the RUN button and watch it go?

(yes, gotta ask AI this too)
 
Lots of options. You can try elastic search, which is a distributed database. Or kafka, you can setup zookeeper on one node and broker on another, then spark, yarn on one node and executors on another.
List goes on
 
  • Like
Reactions: vishalrao