DevOps Practices
DevOps methodologies, tools and practices
A collection of inspiring lists, manuals, memos, blogs, hacking tools, one-liners, CLI/Web tools, etc. This project brings together a variety of interesting and practical technical resources aimed at providing inspiration and knowledge for tech enthusiasts.
Netdata - Netdata can perform distributed, real-time performance and health monitoring for systems or applications
An open source website monitoring tool, similar to "Uptime Robot", which can be used to monitor the current status of the website.
Grafana - A tool for monitoring, metric analysis and dashboards for Graphite, InfluxDB and Prometheus, etc.
A project dedicated to large-scale system design, which gathers the patterns and best practices of scalable, reliable and high-performance systems. It provides developers with rich resources and references to help them design and implement efficient large-scale systems.
Prometheus - CNCF project, used to monitor other systems or services. It collects metrics from the target at a given time interval, evaluates them according to rules, displays the results, and can also trigger alarms if certain monitoring conditions are met
An open-source automated tool, similar to the open-source version of IFTTT, can trigger specific operations based on events set by developers. For example, you can set it so that when GitHubDaily posts a tweet, it automatically sends you an email reminder: "This guy posted a tweet again."
A free and open source Mac system monitoring tool with comprehensive functions, which can monitor CPU, GPU, disk, memory, network, battery, sensors, fans, Bluetooth devices and multi-time zone clocks, is a good alternative to iStat Menus.
InfluxData - a scalable data storage written in go, used for metrics, events and real-time analysis
It records the 90-day journey of a foreign developer learning DevOps. The content includes the definition of DevOps, Linux basic knowledge, computer networks, the use of k8s and containers, automated configuration management, log monitoring management, and data visualization, etc.
A project for writing reusable computer vision tools. Through this project, users can more easily create and manage the tools and processes needed for computer vision applications. Whether it's dataset preparation or model training, Supervision provides tools to help developers.
SkyWalking - APM tool (Application performance monitor) under distributed systems, designed for microservices, cloud-native and container-based (Docker, K8s, Mesos) architectures.
Jenkins - The most popular CI/CD tool, providing support for up to thousands of plugins and almost any automated build.
A big data platform specifically designed and optimized for industries such as the Internet of Things (IoT) and application monitoring. Its database insertion and query operations are 10 times faster than other databases! It also consumes very low costs compared to other typical solutions in this category. TDengine only requires less than 1/5 of computing resources, and it provides interfaces for development in Java, C/C++, Python, Go, RESTful API, etc. Are you still worried about the performance of data writing, reading, and computing? With it, your hair survival rate will definitely increase significantly.
A relatively practical Kubernetes IDE, supporting several major desktop platforms such as Windows, Linux and macOS
A relatively practical open source tool, as an open source alternative to DataDog and NewRelic, which can help developers monitor and analyze project problems in real time
Cilium - Cilium provides and transparently protects network connectivity and load balancing between application workloads such as application containers or processes
Jaeger - Inspired by Dapper and OpenZipkin, Jaeger is an open source distributed tracing system released by Uber Technologies
Python's data stream orchestration platform. If the programs for acquiring, cleaning, and processing data are considered as individual tasks, this project can integrate these tasks into a workflow, enabling their deployment, scheduling, and monitoring on a web platform.
Kestra is an open-source, event-driven orchestration platform designed for both scheduled and real-time workflows. It leverages Infrastructure as Code principles, allowing users to define workflows in YAML directly from an intuitive UI or code editor. Key features include Git version control integration, a rich plugin ecosystem for diverse integrations, and support for scripting in multiple languages. Kestra ensures scalability, fault tolerance, and high availability, making it suitable for handling millions of workflows. It offers advanced workflow management with features like namespaces, labels, subflows, retries, and dynamic tasks. The platform is developer-friendly, supporting CI/CD pipelines, Terraform, and custom plugin development. Kestra’s UI provides real-time validation, auto-completion, and a visual DAG representation for seamless workflow creation and monitoring.
Zipkin - Zipkin is a distributed tracing system. It helps collect timing data needed to solve latency issues in service architectures. Features include collecting and finding such data
Open source learning resources on GitHub: "DevOps 2022 Technology Roadmap", which will help you quickly understand the latest DevOps technology stack. It includes various DevOps-related learning materials such as Git, common programming languages, Linux, network security, containers, IaC, CI/CD, etc.
OpenObserve is a cloud-native visualization monitoring platform specifically designed for logs, metrics, tracing, and analytics, engineered for PB-scale. It offers 10 times simplicity, 140 times lower storage costs, high performance, and an Elasticsearch/Splunk/Datadog alternative for PB-scale (logs, metrics, tracing).
A programmable CI/CD engine designed to make the continuous integration and delivery process more flexible and efficient. Its uniqueness lies in allowing users to run the entire CI/CD process within containers, thereby achieving better isolation and environmental consistency. This project provides a novel way to manage software delivery processes, enabling teams to better control and customize their continuous delivery pipelines.
Thanos - Thanos is a set of components that can be combined to form a highly available system with infinite storage capacity, which can seamlessly add to an existing Prometheus.
An integrated observability platform that combines the strengths of Prometheus and Grafana. The platform provides a comprehensive solution for alert management, metric visualization, logs, and tracing. Its powerful web UI design makes monitoring and data analysis more intuitive and efficient.
An agentless automated operation and maintenance platform designed for small and medium-sized enterprises, which integrates a series of functions such as host management, batch execution of hosts, online terminals of hosts, application release, task scheduling, configuration center, monitoring, alarm, etc.
resilience4j - A lightweight circuit breaker inspired by Hystrix but designed specifically for Java 8 and functional programming.
Keep is an open-source AIOps and alert management platform designed to streamline incident response and monitoring. It offers a unified interface for alert deduplication, enrichment, filtering, and correlation, along with bi-directional integrations with various observability tools. Key features include customizable workflows, AI-powered incident summarization, and automation capabilities akin to GitHub Actions for monitoring. Keep supports a wide range of integrations, including AI backends, observability tools, databases, and communication platforms. It is enterprise-ready, offering robust security, flexible deployment options, and scalability for production environments. Developers can easily get started with Docker, Kubernetes, or cloud deployments, making it a versatile solution for modern IT operations.
A terminal system monitoring panel implemented with Node.js
An open source system status monitor: eul, which supports viewing the usage of various parameters such as CPU, fans, memory, battery, and network. Students who are accustomed to using macOS can try it out
Spinnaker - Spinnaker is an open source, multi-cloud continuous delivery platform for fast and highly available software releases
OpenEBS - Leading open source container storage built on top of native architecture, simplifies stateful applications running on k8s
A Practical Linux System Monitoring and Optimization Tool
Flyway - Make database migrations simple
An open source tool for SQL auditing, which is committed to helping developers quickly complete the audit, detection, execution and rollback of SQL statements, so that daily SQL changes can be more standardized.
A list of rich DevOps learning resources covering CI/CD, databases, development operations practices, interview preparation, operating systems, networks, terminal commands, and more, along with a guide to getting started with DevOps.
A home self-managed infrastructure, which is open source by a Vietnamese programmer on GitHub, supports automatic configuration, operation and update of various self-managed services, and can be used to build your own home development lab. It includes code self-hosting, certificate management, CI/CD continuous integration and delivery, K8s automated installation and management, real-time chat system, application monitoring panel, and other functions.
An open-source architecture design book covering topics such as cloud computing, networks, distributed systems, container technology, observability, service mesh, and DevOps, helping programmers to deeply understand the principles and practices of related technologies.
Resolve production issues quickly. An open source observability platform unifying session replays, logs, metrics, traces, and errors.
Chef Infra - A systems integration framework designed to bring the benefits of configuration management to the entire infrastructure.
Concourse - Concourse is an automated system written in Go. It is most commonly used for CI/CD and can be scaled to any type of automation pipeline, from simple to complex.
Falcon - Xiaomi's enterprise-level, highly available and scalable open source monitoring solution
GoCD - The server for continuous delivery, GoCD helps you automate and simplify the build test - release cycle, so that you can deliver products continuously without worries
Fluent Bit is a high-performance, lightweight log, metrics, and traces processor and forwarder, part of the CNCF and Fluentd ecosystem. It supports Linux, Windows, MacOS, and BSD systems, running on x86, ARM, and other architectures. Key features include low CPU/memory usage, data parsing (JSON, Regex, LTSV, Logfmt), reliability with backpressure handling and buffering, and built-in TLS/SSL for secure networking. It offers a pluggable architecture with over 70 built-in plugins for inputs, filters, and outputs, extensible in C, Lua, and Golang. Fluent Bit also provides SQL-based stream processing for data manipulation, analytics, and timeseries forecasting. Widely adopted in production, it integrates with backends like Elasticsearch, Kafka, AWS, Azure, and more, making it a versatile solution for log and metrics processing.
The Chinese translation of the developer-roadmap (2018 Web Developer Roadmap) library with more than 46,000 stars on GitHub
An open-source A/B testing platform with Bayesian statistics engine, test result classification, visualization editor, email alerts and other functions, and can pull data from platforms such as Snowflake, Redshift, BigQuery, GA statistics, etc.
Recommend a user-friendly cloud monitoring system on GitHub: HertzBeat. No Agent required, with powerful custom monitoring capabilities. Supports website monitoring, PING connectivity, port availability, databases, operating systems, API monitoring, threshold alerts, and alert notifications.
Graphite - Graphite is a highly scalable real-time graphics system that runs well on inexpensive hardware
Rundeck - Rundeck is an open source automation service with a built-in web console, command line tools and WebAPI. It allows you to easily run automated tasks on a group of nodes
Cortex - a horizontally scalable, highly available, multi-tenant long-term storage medium for Prometheus