This is a story about about a personal itch, and about scalability. And like any good tech story, it begins with a shaky architecture. software architecture, At , we help large enterprises to measure the security posture of their suppliers. But I’m not going to get into the whole 3rd party extravaganza with you. we came to talk about our architecture and process. Panorays security management In the beginning, there was bash. and scripts to manage VMs. a lot of scripts. There was a VM instance for each company we assessed. Every VM executed that imitate the whole reconnaissance phase of the hacker’s lifecycle. sequential batch jobs Company level parallelism is achieved by firing up more VMs. We built an internal orchestration system via Cron & Bash (imagine how fun was that…). Problems: The was at the company level, and not at the job level. parallelism The process wasn’t . transparent Servers was low. utilization Triggered Manually The Rise of The Transporter a Dynamic Workflow Engine, built to create workflows and execute them as Kubernetes Jobs. A container based architecture makes The Transporter both enough to configure jobs separately and enough to scale. flexible efficient It favors when possible, according to the workflow dependencies and provides a REST API for a fully pipeline. parallelism automated Standing on the Shoulders of Giants The Transporter’s API is automatically triggered whenever a new company is entered to the platform. The Transporter then deploys the jobs to kubernetes in parallel or sequentially, according to a predefined workflow. Overview As with the original transporter, The Transporter follows a few simple rules: The Rules The Deal is the Deal The transporter deal is you define the workflow, The transporter will make it happen.But in order to define a workflow we first need to define a job. In our case a is the equivalent of running a .A group of these jobs are a , a phase can be sequential or parallel.A is a sequence of phases. job docker container phase workflow Now, we can enjoy parallelism while still follow some rules. Workflow Example Never Make a Promise You Can’t Keep The Transporter leverage a .In this architecture tasks get transported to queues, workers consume the tasks from the queues and perform these tasks.This architecture makes it possible to a failed task,set a set a , and tasks for later. distributed task queue architecture retry timeout, priority schedule We send to alert on workflow start, success and failures. notifications Distributed Task Queue Architecture Under The Hood Now, we are ready to tie it all together — The transporter provides endpoints to manipulate a behind the scenes the workflow gets translated to tasks. we use celery chains and celery groups to set dependencies.these tasks get transported to queues based on the dependencies.On the other side celery workers consume tasks from the queuesand deploy the corresponding .The result — a workflow getting accomplished according to job dependencies. workflow Celery KubernetesJob We also added endpoints to control for convenience.The number of running workers sets the limit for how many jobs can run concurrently. workers Never Open the (Package) Container Our new process includes security researchers building and pushing to Registry.The Transporter transports the corresponding jobs according to the workflow and a which defines the version of each job.Kubernetes is the engine actually executing the underlying docker containers. deployment docker images Google Cloud ConfigMap Updating Jobs No Names At first we set the name the same as the original job name with a unique identifier appended at the end. KubernetesJob How to Name a Job Not and this is how we discovered some : Kubernetes naming limitations The first one is the regex for validating name a job name which is basically alphanumeric characters separated by dashes or underscoresWe discovered the second limitation - for maximum charactersthanks to our security researchers who appreciate long overly-detailed names - unique identifiers for names. but we still wanted to know the original job name and the company name which leads me to — - labels. labels everywhere. #1 Tip #2 Tip Frameworks We Considered Before implementing our own workflow engine we checked some existing solutions. - Airflow is great. It works by rendering python files into DAGs which represents a workflow.If you have a static workflow which is determined pre-runtime you want to execute like an ETL flow I recommend to try working a solution with Airflow. Airflow’s problem lies in dynamic workflows — check out this .The reason we decided not to use it was our need for generating dynamic workflows, which changes based on our REST API requests. Airflow proper way to create dynamic workflows in airflow on stack overflow - Is google’s pub sub solution, we didn’t use it because it required a massive code change on all of the “jobs” side. Google Pub/Sub You can check out this for more alternatives. task queue post What’s Next? — We want to add a UI to make it easy to monitor and troubleshoot active and finished workflows. UI — If we make The Transporter a bit more generic maybe we could release it as an . Generify open source Call to Action If you have a use case which involves running batch jobs according to certain dependencies (e.g Data Acquisition, Web Crawling System) and you are interested in scaling with please or to let me know. The Transporter comment reach out If you enjoyed this post, feel free to hold down the clap button 👏🏽 and if you’re interested in posts to come, make sure to follow me on Medium: https://medium.com/@talperetz24 Twitter: https://twitter.com/talperetz24 LinkedIn: https://www.linkedin.com/in/tal-per/