Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.
DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!
Agile, Waterfall, and Lean are just a few of the project-centric methodologies for software development that you'll find in this Zone. Whether your team is focused on goals like achieving greater speed, having well-defined project scopes, or using fewer resources, the approach you adopt will offer clear guidelines to help structure your team's work. In this Zone, you'll find resources on user stories, implementation examples, and more to help you decide which methodology is the best fit and apply it in your development practices.
Demystifying Agile Development Methodologies: Scrum vs. Kanban
Mid-Mortem Should Not Be Optional
Before We Start, What’s a Chapter? A chapter is a craft community (sometimes also referred to as a Practice or CoE) whose primary purpose is to help practitioners of the same craft or discipline (e.g., development, testing, business analysis, scrum mastery, UX, etc.) elevate their mastery of that particular craft or discipline. Chapters are also typically tasked with setting the standards and guidelines for performing that craft in the organization. Credit: Ashley-Christian Hardy, “Agile Team Organisation: Squads, Chapters, Tribes and Guilds,” 2016 TL/DR In Agile organizations, chapters (Chapters, Practices, CoEs, etc.) pursue the systematic development of capability and craft. This pursuit adds a lot of value to the organization (better quality, better knowledge retention, higher employee engagement, etc.). Chapters often struggle to pinpoint where they need to improve or how they can add more value to their members and the organization. Many organizations don’t offer clear guidelines to chapters (and chapter leads) as to what exactly is expected of them or what good looks like. This article presents a simple diagnostic (click to download the diagnostic) that a chapter could use to identify areas where they must focus their improvement efforts. It defines what ‘good’ looks like in the context of a chapter and provides a tool to help the chapter assess where they stand against that definition (where they’re doing well and where they need to improve). In the second part of this series, I will share several experiments that could be run to optimize each dimension of the chapter's effectiveness. I will also share a case study of how this model was implemented at scale at a large organization. Key Terms First, let’s define some of the key terms that will be used throughout this article: Craft refers to the specific discipline, domain, or skill set around which the chapter is formed. e.g., QA is a craft; UX is a craft; business analysis is a craft, etc. A craftsperson is a practitioner of a craft (developer, QA specialist, marketer, business analyst, etc.) I use the term performing the craft to refer to the actual day-to-day carrying out of tasks by a craftsperson (chapter member) – e.g. a QA specialist performs their craft by carrying out QA tasks. Craft quality (quality of the craft) refers to how well the craft is being performed. I sometimes use craftsmanship, which refers to craft mastery and excellence. Knowledge base refers to a centralized repository or system where craft-related information, best practices, documentation, standards, etc. are stored and shared among chapter members (and others across the organization). Standards (craft standards) refer to the established guidelines and principles that define the expected level of quality within a specific craft. Learning journey refers to the ongoing formal and informal learning efforts (training programs, hands-on application of new knowledge, mentoring, etc.) intended to extend existing skills and build new skills, and how that learning is expected to be acquired over time. Is It Worth Reading This Article? Well, if any of the following statements resonate with you, then I would strongly suggest that you read on: As an organization (or tribe, business unit, etc.): “We have a high risk of losing critical knowledge if a small number of our people leave” “We struggle with onboarding and, despite hiring top talent, it’s difficult for them to hit the ground running” “Despite hiring people with a lot of experience to mentor and grow our junior staff, we feel that knowledge sharing isn’t happening” “We invest a lot in training our people – sending them to courses, etc. – but we don’t feel that investment has had much impact on the overall quality of our products” “We have knowledge siloes even within the same discipline — there are islands of expertise that aren’t connected” “We fear that when the contractors and external consultants leave, our people won’t be able to deliver the same level of high craft quality” “Team members struggle when moving from one team to another due to a lack of consistency in how things are done” “Our team members seem to struggle to tap into the collective expertise of the team, leading to a lot of re-inventing the wheel and time wasted” While these are all difficult problems that result from complex interactions of causes that affect and are affected by every aspect of how the organization works, these issues are all heavily influenced by how effective we are at developing craftsmanship — that is, how good we are at developing craft mastery and excellence. Given that in Agile organizations craft communities (chapters, practices, CoEs, etc.) are the primary custodians of developing craftsmanship, what I’m proposing here is that by systematically assessing and optimizing the effectiveness of how these craft communities work, we can make great strides at addressing these challenges. So, Why Care About Chapters? Effective chapters create the conditions that enable high product quality, high employee satisfaction, and low knowledge loss risk. This is because effective chapters are good at developing master craftspeople. People who feel mastery of their craft are typically happier and more engaged, their output and the speed at which that output is delivered is superior, and, due to the fact that there’s more than a small number of them (and that there’s a robust process to develop more), the departure of a few won’t be catastrophic for the organization. Members of an effective chapter (that is, a chapter that’s good at developing the craftsmanship of its members), would typically say things like: Our chapter follows a systematic approach to building our capability and craft mastery (defining what capability needs to be built, providing its members with the mechanisms to plan how to build those capabilities, and providing the resources and support needed to implement that plan) Our chapter has in place the process and systems that enable us to leverage and build upon the accumulated formal knowledge that the chapter has amassed over time – the standards, playbooks, guidelines, best practices, lessons learned, etc. Our chapter has nurtured a rich social network that enables us to tap into the collective informal (tacit) knowledge and expertise of the chapter – the knowledge, nuances, and highly contextual experiences that aren’t documented anywhere (most knowledge is tacit) Our chapter follows a systematic approach to measuring the impact (outcomes) of craftsmanship-building and capability uplift efforts and leveraging the feedback to guide further craftsmanship-building efforts If we improve the effectiveness of a chapter (that is, optimize the 4 factors mentioned above that are key leading indicators to chapter effectiveness), we would increase the chapter’s ability to develop its members into craftspeople, which will positively affect and improve problems such as high knowledge loss risk, siloes, ineffective knowledge sharing, and low product quality. How Do We Improve Chapter Effectiveness? The first step to improving chapter effectiveness is to systematically assess how the chapter is performing against the 4 key dimensions of chapter effectiveness identified above (access to documented (formal) knowledge; systematic capability building; access to tacit (informal) knowledge; and systematic craft quality measurement and continuous improvement). In this 2-part series, I will introduce a simple diagnostic tool to assess chapter effectiveness (Part 1 – this article), and then delve into how to use the insights from the assessment to identify areas of improvement and how to go about setting chapter effectiveness goals and planning, implementing, and learning from chapter effectiveness improvement actions (Part 2). How Do We Measure Chapter Effectiveness? In this section, we will first go over the dimensions comprising it in some detail, and then present the diagnostic tool itself. Chapter Effectiveness Dimensions Dimension #1 The comprehensiveness, fitness-for-purpose, and ease of access (and use) of our craft standards and knowledge base – this is a broad area that covers things like how good we are at leveraging (and creating/documenting) existing knowledge, the ease of access to relevant knowledge, the accuracy and relevance of the knowledge chapter members can draw from (and its alignment with industry best practices), and the effective distilment of ‘lessons learned,’ which represents how outside knowledge is contextualized to fit the unique context of the organization, among other factors. Dimension #2 The effectiveness of the chapter’s effort to uplift the capability and craftsmanship of its members — effective chapters are good at describing what mastery of their craft means (what skills to acquire, what the levels of mastery of each skill look like, etc.), helping chapter members pinpoint where they are on that journey, and then collaborate as a team to envision what the path to mastery looks like for each member. They’re also good at taking those plans and turning them into reality: not only providing the resources and mentorship, but also the encouragement and peer support, keeping each other accountable, and measuring the outcomes of their elevated levels of craft mastery. Dimension #3 The effectiveness of tacit (informal) knowledge sharing between chapter members – Effective chapters realize that most knowledge is tacit – that is, not documented anywhere. Tacit knowledge is difficult to extract or express, and therefore difficult to formally document or write down. How do we effectively leverage knowledge that isn’t documented? By nurturing a thriving social network that allows chapter members to feel comfortable reaching out to each other for help, seek advice, ask questions, share interesting insights, etc. Such a network doesn’t just happen – it requires conscious, persistent effort to build. The statements comprising this dimension seek to explore some of the leading indicators to building effective knowledge-sharing and advice-seeking social networks. Dimension #4 The effectiveness of the chapter’s efforts to systematically and continuously improve craft quality – how do we know if the actions we’re undertaking to uplift the quality of our craft (committing to learning and capability uplift, fostering stronger knowledge-sharing networks, etc.) are delivering value? How do we know if the investment we’re putting into uplifting our capability into specific tools or frameworks is generating the returns expected? Effective chapters are really good at measuring and evaluating the quality of their craft across the organization (quantitatively and/or qualitatively). They’re good at setting SMART craft improvement goals given their understanding of how well the craft is being implemented and where they need to focus and invest in improvement, planning how to implement those goals, and good at implementing those plans (and learning from that implementation). This is a significant challenge area for many chapters, as it is often difficult to ‘measure’ the quality of how the craft is being implemented. The Chapter Effectiveness Diagnostic Introduction The diagnostic (click here to download the pdf version) comprises a number of statements that are intended to capture what ‘good’ looks like for that particular dimension. Chapter members are expected to rate how well they believe each statement describes the reality of their chapter on a scale ranging from 'completely disagree' to 'completely agree.' All chapter members (including their chapter lead) should take part in completing this diagnostic. One option to fill it (what many chapters do) is to send it out as a survey first, then discuss results or insights in one or more follow-up workshops. The purpose of this diagnostic is to serve as a conversation starter. As with all diagnostic and maturity models, the questions are merely intended to prompt us to have a discussion. The comments, anecdotes, and insights chapter members share as the team moves from one item to another provide a wealth of information. That information is what’s going to guide us (as a chapter) as we attempt to optimize the outcomes our chapter is creating. There’s no particular magic to this (or any) assessment model – it simply provides a structure within which good conversations can take place. What’s in the Pack? This pack contains the statements comprising this diagnostic model. Next to each statement is a brief explanation of why having a conversation about that statement is important and what to look for (and how to dig deeper and uncover insights) if the score against that particular statement is low. In the appendix, you'll find a printable version of the diagnostic (a template with only the statements), which can be distributed as handouts. Next Steps If you want to run the diagnostic as a survey, copy the statements into your survey tool. You may set the response options for each statement as completely disagree — disagree — neutral — agree — completely agree. Alternatively, you might opt for a sliding scale of 1-5, for example, or use whatever rating method you prefer to enable your team to assess how well each statement describes its reality. OK, We Ran the Diagnostic – What’s Next? As mentioned earlier, the conversation that follows this self-assessment is where we really get the value. As a team, the chapter gets together to reflect, explore, and try to make sense of the results of their chapter effectiveness self-assessment. They reflect on where they seem to be doing well and where they’re struggling, where they seem to all have the same experience, and where the scores reflect a difference in their perceptions. They reflect on common themes, outliers, relationships, and connections between statements, explore why some statements are not correlated even though they were expected to (and vice versa), and any other interesting insights that came out of the assessment. In the second part of this series, we will do a deep dive into how to translate these insights and learning into experiments and actions and measure the impact they create in optimizing chapter effectiveness. We will explore how to prioritize chapter effectiveness interventions, what experiments to run to uplift each chapter effectiveness dimension, and how to establish a robust continuous improvement cycle to consistently and systematically seek higher chapter effectiveness. We will go through a case study from a large financial organization where this model was applied at scale across a large number of chapters, and share some of the learnings from that experience.
Ensuring application reliability is a never-ending quest. Finite state machines (FSMs) offer a solution by modeling system behavior as states and transitions, a useful tool that can help software engineers understand software behavior and design effective test cases. This article explores the pros and cons of FSMs via simple examples. We will also make a short comparison between the usefulness and applicability of FSMs and program graphs in software testing. What Are FSMs? FSMs are a powerful tool used to model systems that exhibit distinct states and transitions between those states. They are our visual roadmaps for a system's behavior. Here's a breakdown of their core principles: An FSM is a directed graph where nodes represent states and edges represent transitions between states. Transitions are triggered by events, and actions might occur upon entering or leaving a state. Labels on transitions specify the events that trigger them and the actions that occur during the transition. FSMs are a simple and visual way to represent systems that react differently to various events. Let's explore Python code for a simple vending machine and demonstrate how an FSM aids in designing effective test cases. Python class VendingMachine: def __init__(self): self.state = "idle" self.inserted_amount = 0 self.product_selected = None def insert_coin(self, amount): if self.state == "idle": self.inserted_amount += amount print(f"Inserted ${amount}. Current amount: ${self.inserted_amount}") else: print("Machine busy, please wait.") def select_product(self, product): if self.state == "idle" and self.inserted_amount >= product.price: self.state = "product_selected" self.product_selected = product print(f"Selected {product.name}.") else: if self.state != "idle": print("Please dispense product or return coins first.") else: print(f"Insufficient funds for {product.name}.") def dispense_product(self): if self.state == "product_selected": print(f"Dispensing {self.selected_product.name}.") self.state = "idle" self.inserted_amount = 0 self.product_selected = None else: print("No product selected.") def return_coins(self): if self.state == "idle" and self.inserted_amount > 0: print(f"Returning ${self.inserted_amount}.") self.inserted_amount = 0 else: print("No coins to return.") # Example products class Product: def __init__(self, name, price): self.name = name self.price = price product1 = Product("Soda", 1.00) product2 = Product("Chips", 0.75) # Example usage vending_machine = VendingMachine() vending_machine.insert_coin(1.00) vending_machine.select_product(product1) vending_machine.dispense_product() vending_machine.insert_coin(0.50) vending_machine.select_product(product2) vending_machine.dispense_product() vending_machine.return_coins() The code simulates a basic vending machine with functionalities like coin insertion, product selection, dispensing, and coin return. Let's see how an FSM empowers us to create robust test cases. FSM Design for the Vending Machine The vending machine's FSM may have four states: Idle: The initial state where the machine awaits user input Coin insertion: State active when the user inserts coins Product selection: State active after a product is selected with sufficient funds Dispensing: State active when the product is dispensed and change (if any) is returned Transitions and Events Idle -> Coin Insertion: Triggered by the insert_coin method Coin Insertion -> Idle: Triggered if the user tries to insert coins while not in the "idle" state (error scenario) Idle -> Product Selection: Triggered by the select_product method if sufficient funds are available Product Selection -> Idle: Triggered if the user selects a product without enough funds or attempts another action while a product is selected Product Selection -> Dispensing: Triggered by the dispense_product method Dispensing -> Idle: Final state reached after dispensing the product and returning change Test Case Generation With FSM By analyzing the FSM, we can design comprehensive test cases to thoroughly test the program: 1. Valid Coin Insertion and Product Selection Insert various coin denominations (valid and invalid amounts). Select products with exact, sufficient, and insufficient funds. Verify the machine transitions to the correct states based on inserted amounts and selections. Example Test Case: Start in "Idle" state. Insert $1.00 (transition to "Coin Insertion"). Select "Soda" (transition to "Product Selection" if funds are sufficient, otherwise remain in "Idle"). Verify the message: "Selected Soda." Insert $0.25 (transition to "Coin Insertion"). Select "Chips" (transition to "Product Selection" if the total amount is sufficient; otherwise, remain in "Product Selection"). Verify the message: "Dispensing Chips." or "Insufficient funds for Chips." (depending on the previous coin insertion). Expected behavior: The machine should dispense "Chips" if the total amount is $1.25 (enough for product and change) and return the remaining $0.25. If the total amount is still insufficient, it should remain in the "Product Selection" state. 2. Edge Case Testing Insert coins while in "Product Selection" or "Dispensing" state (unexpected behavior). Try to select a product before inserting any coins. Attempt to dispense the product without selecting one. Return coins when no coins are inserted. Verify the machine handles these scenarios gracefully and provides appropriate messages or prevents invalid actions. Example Test Case: Start in "Idle" state. Insert $1.00 (transition to "Coin Insertion"). Select "Soda" (transition to "Product Selection"). Try to insert another coin (should not allow in "Product Selection"). Verify the message: "Machine busy, please wait." Expected behavior: The machine should not accept additional coins while a product is selected. 3. State Transition Testing Verify the program transitions between states correctly based on user actions (inserting coins, selecting products, dispensing, returning coins). Use the FSM as a reference to track the expected state transitions throughout different test cases. Benefits of FSMs FSMs provide a clear understanding of the expected system behavior for different events. They aid in defining and documenting requirements. By mapping the FSM, testers can efficiently design test cases that cover all possible transitions and ensure the system reacts appropriately to various scenarios. FSMs can help identify inconsistencies or missing logic in the early design stages. This prevents costly bugs later in the development process. They act as a bridge between technical and non-technical stakeholders, facilitating better communication and collaboration during testing. But let's see some examples: Clear Requirements Specification A tech startup was developing a revolutionary smart building management system. Their latest challenge was to build an app that controls a sophisticated elevator. The team, led by an enthusiastic project manager, Sofia, was facing a communication breakdown. "The engineers keep changing the app's behavior!" Sofia exclaimed during a team meeting. "One minute it prioritizes express calls, the next it services all floors. Clients are confused, and we're behind schedule." David, the lead software engineer, scratched his head. "We all understand the core functionality, but translating those requirements into code is proving tricky." Aisha, the new UI/UX designer, piped up, "Maybe we need a more visual way to represent the elevator's behavior. Something everyone can understand from a glance." Sofia pulled out a whiteboard. "What if we create an FSM for our app?" The team huddled around as Sofia sketched a diagram. The FSM depicted the elevator's different states (Idle, Moving Up, Moving Down, Door Open) and the events (button press, floor sensor activation) that triggered transitions between them. It also defined clear outputs (door opening, floor announcement) for each state. "This is amazing!" David exclaimed. "It clarifies the decision-making process for the elevator's control system." Aisha smiled. "This FSM can guide the user interface design as well. We can show users the elevator's current state and expected behavior based on their input." Over the next few days, the team refined the FSM, ensuring all user scenarios and edge cases were accounted for. They used the FSM as a reference point for coding, UI design, and even client presentations. The results were impressive. Their app functioned flawlessly, prioritizing express calls during peak hours and servicing all floors efficiently. The clear user interface, based on the FSM, kept everyone informed of the elevator's current state. "The FSM was a game-changer," Sofia declared during a successful client demo. "It provided a shared understanding of the system's behavior, leading to smooth development and a happy client." The success of the app served as a testament to the power of FSMs. By providing a clear visual representation of a system's behavior, FSMs bridge communication gaps, ensure well-defined requirements, and can lead to the development of robust and user-friendly software. Test Case Generation Another startup was working on an AI-powered security gate for restricted areas. The gate-controlled access is based on employee ID badges and clearance levels. However, the testing phase became a frustrating maze of random scenarios. "These bugs are popping up everywhere!" groaned Mike, the lead QA tester. "One minute the gate opens for a valid ID, the next it denies access for no reason." Lisa, the lead developer, frowned. "We've written tons of test cases, but these glitches keep slipping through." New to the team, Alex, a recent computer science graduate, listened intently. "Have you guys considered using an FSM?" Mike asked, "Finite State Machine? What's that?" Alex explained how an FSM could visually represent the app's behavior. It would show various states (Idle, Verifying ID, Access Granted, Access Denied) and the events (badge swipe, clearance check) triggering transitions. "By mapping the FSM," Alex continued, "we can systematically design test cases that cover all possible transitions and ensure that our app reacts appropriately in each scenario." The team decided to give it a try. Together, they sketched an FSM on a whiteboard. It detailed all possible badge swipes (valid ID, invalid ID, revoked ID) and corresponding state transitions and outputs (gate opening, access denied messages, security alerts). Based on the FSM, Mike and Alex designed comprehensive test cases. They tested valid access for different clearance levels, attempted access with invalid badges, and even simulated revoked IDs. They also included edge cases, like simultaneous badge swipes or network disruptions during the verification process. The results were remarkable. The FSM helped them identify and fix bugs they hadn't anticipated before. For instance, they discovered a logic error that caused the app to grant access even when the ID was revoked. "This FSM is a lifesaver!" Mike exclaimed. "It's like a roadmap that ensures we test every possible pathway through the system." Lisa nodded, relieved. "With these comprehensive tests, we can finally be confident about our app's reliability." The team learned a valuable lesson: FSMs aren't just theoretical tools, but powerful allies in the software testing battleground. Early Error Detection Another development team was building a VoIP app. It promised crystal-clear voice calls over the internet, but the development process had become a cacophony of frustration. "The call quality keeps dropping!" Mary, the lead developer, grimaced. "One minute the audio is clear, the next it's a mess." Jason, the stressed project manager, pinched the bridge of his nose. "We've been fixing bugs after each test run, but it feels like a game of whack-a-mole with these audio issues." Anna, the new UI/UX designer, suggested, "Maybe we need a more structured approach to visualizing how our VoIP app should behave. Something that exposes potential glitches before coding begins." Mark remembered a concept from his first-year computer science degree. "What about a Finite State Machine (FSM)?" The team gathered around the whiteboard as Mark sketched a diagram. The FSM depicted the app's various states (Idle, Initiating Call, Connected, In-Call) and the user actions (dialing, answering, hanging up) triggering transitions. It also defined expected outputs (ringing tones, voice connection, call-ended messages) for each state. "This is amazing!" Anna exclaimed. "By mapping out the flow, we can identify potential weaknesses in the logic before they cause audio problems down the line." Over the next few days, the team painstakingly detailed the FSM. They identified a crucial gap in the logic early on. The initial design didn't account for varying internet connection strengths. This could explain the erratic call quality that Mary described. With the FSM as a guide, Alex, the network engineer, refined the app's ability to adapt to different bandwidths. The app dynamically adjusted audio compression levels based on the user's internet speed. This ensured a smoother call experience even with fluctuating connections. The FSM unveiled another potential issue: the lack of a clear "call dropped" state. This could lead to confusion for users if the connection abruptly ended without any notification. Based on this insight, the team designed an informative "call ended" message triggered by unexpected connection loss. By launch day, the VoIP app performed flawlessly. The FSM helped them catch critical glitches in the early stages, preventing user frustration and potential churn. Improved Communication Another development team was building a mobile banking app. It promised cutting-edge security and user-friendly features. However, communication between the development team and stakeholders had become a financial nightmare of misunderstandings. "Marketing wants a flashy login animation," Nick, the lead developer, sighed. "But it might conflict with the two-factor authentication process." Joe, the project manager, rubbed his temples. "And the CEO keeps asking about facial recognition, but it's not in the current design." John, the intern brimming with enthusiasm, chimed in, "Have you considered using a Finite State Machine (FSM) to model our app?" John explained how an FSM could visually represent the app's flow. It would show different states (Idle, Login, Account Selection, Transaction Confirmation) with user actions (entering credentials, selecting accounts, confirming transfers) triggering transitions. "The beauty of an FSM," John continued, "is that it provides a clear and concise picture for everyone involved. Technical and non-technical stakeholders can readily understand the app's intended behavior." The team decided to give it a shot. Together, they sketched an FSM for the app, detailing every step of the user journey. This included the two-factor authentication process and its interaction with the login animation. It was now clear to marketing that a flashy animation might disrupt security protocols. The FSM became a communication bridge. Joe presented it to the CEO, who easily grasped the limitations of facial recognition integration in the current design phase. The FSM helped prioritize features and ensure everyone was on the same page. Testers also benefited immensely. The FSM served as a roadmap, guiding them through various user scenarios and potential edge cases. They could systematically test each state transition and identify inconsistencies in the app's behavior. By launch time, the app functioned flawlessly. The FSM facilitated clear communication, leading to a well-designed and secure banking app. Stakeholders were happy, the development team was relieved, and John, the hero with his FSM knowledge, became a valuable asset to the team. The team's key takeaway: FSMs are not just for internal development. They can bridge communication gaps and ensure smooth collaboration between technical and non-technical stakeholders throughout the software development lifecycle. FSMs vs. Program Graphs: A Comparison While both FSMs and program graphs are valuable tools for software testing, they differ in their scope and level of detail. To understand how both tools can be related, the following analogy may help. Assume we are exploring a city. An FSM would be like a map with labeled districts (states) and connecting roads (transitions). A program graph would be like a detailed subway map, depicting every station (code blocks), tunnels (control flow), and potential transfers (decision points). FSMs: What They Are Suitable for Testing State-driven systems: User interfaces, network protocols, and apps with a clear mapping between states and events Functional testing: Verifying system behavior based on user inputs and expected outputs in different states Regression testing: Ensuring changes haven't affected existing state transitions and system functionality FSM Weaknesses Limited scope: FSMs may struggle with complex systems that exhibit continuous behavior or have complex interactions between states. State explosion: As the system complexity increases, the number of states and transitions can grow exponentially, making the FSM cumbersome and difficult to manage. Limited error handling: FSMs don't explicitly represent error states or handling mechanisms, which might require separate testing approaches. Program Graphs: What They Are Suitable for Testing Software with complex logic: Code with loops, branches, functions, and intricate interactions between different parts of the program Integration testing: Verifying how different modules or components interact with each other Unit testing: Focusing on specific code functions and ensuring they execute as expected under various conditions Program Graphs' Weaknesses Complexity: Creating and interpreting program graphs can be challenging for testers unfamiliar with code structure and control flow. Abstract view: Program graphs offer a less intuitive representation for non-technical stakeholders compared to FSMs. State abstraction: Complex state changes might not be explicitly represented in program graphs, requiring additional effort to map them back to the system's states. Choosing the Right Tool For state-based systems with clear events and transitions, FSMs are a great starting point, offering simplicity and ease of use. For more complex systems or those with intricate control flow logic, program graphs provide a more detailed and comprehensive view, enabling thorough testing. In many cases, a combination of FSMs and program graphs might be the most effective approach. FSMs can provide a high-level overview of system behavior, and program graphs can delve deeper into specific areas of code complexity. By understanding the strengths and limitations of each approach, you can choose the best tool for your specific software testing needs. Wrapping Up FSMs are a useful tool for software development to represent a system's behavior. They excel at clearly defining requirements, ensuring all parties involved understand the expected functionality. FSMs also guide test case generation, making sure all possible scenarios are explored. Most importantly, FSMs help catch inconsistencies and missing logic early in the development phase, preventing costly bugs from appearing later. Understanding their pros and cons can help us improve our testing efforts. After all, we can use FSMs alone or in parallel with other tools like program graphs.
Welcome back to our ongoing series, “Demystifying Event Storming,” where we navigate the intricacies of Event Storming and its applications in understanding complex business domains and software systems. I’m thrilled to embark on the next phase of our journey, especially considering the anticipation surrounding this installment. In the previous parts of our series, we embarked on an illuminating exploration of Event Storming, unraveling its collaborative nature and its role in mapping out complex business processes and interactions. We delved into the fundamentals of Event Storming, understanding its unique approach compared to traditional methods, and explored its practical application in process modeling. Furthermore, in Part Three, we ventured into the design-level aspect of Event Storming, focusing on identifying aggregates. The feedback and engagement from our readers have been phenomenal, with many expressing eagerness for the next installment. Numerous inquiries have poured in, each echoing a common sentiment: the anticipation for Part Four, where we delve into the critical concept of “bounded contexts” in Domain-Driven Design (DDD). Bounded contexts, despite their pivotal role in software design, often remain one of the least understood concepts in Domain-Driven Design (DDD). This lack of clarity can lead to significant challenges in system architecture and implementation. Therefore, in this highly anticipated installment, we’ll unravel the intricacies of bounded contexts, shedding light on their significance in software design and their role in delineating clear boundaries within a system. Building upon the foundation laid in our previous discussions, we’ll explore how Event Storming serves as a powerful tool for identifying and defining bounded contexts, facilitating better organization and communication within complex systems. Context Is King In the realm of Domain-Driven Design (DDD), the paramount principle that “context is king” underscores the significance of understanding the specific environment in which a system operates. This principle serves as a cornerstone for designing intricate software systems that effectively address user needs and align with business objectives. Through a nuanced exploration of context, teams can navigate the complexities of system design and ensure that their solutions resonate with users and stakeholders alike. Consider the scenario of encountering a pizza slice in a park versus purchasing pizza from a renowned pizza shop. These two experiences, although involving the same food item, unfold within vastly different contexts, profoundly influencing perceptions and experiences. When faced with a pizza slice found in a park, one may harbor doubts about its freshness and safety, given its exposure to the outdoor environment. Conversely, the act of purchasing pizza from a reputable pizza shop evokes feelings of anticipation and satisfaction, buoyed by trust in the establishment’s quality and hygiene standards. Understanding the nuances of context is pivotal for establishing trust and reliability within a system. Trust in the quality and safety of pizza from a reputable shop contrasts starkly with skepticism surrounding a pizza slice discovered in a park. By discerning the context in which entities operate, teams can instill confidence in their systems and cultivate positive user experiences. Conversely, overlooking context or misinterpreting it can erode trust and lead to dissatisfaction among users. Moreover, encapsulation emerges as a vital principle. By encapsulating the functionalities and characteristics of entities such as parks and pizza shops within distinct contexts, teams can achieve greater clarity, modularity, and maintainability in their systems. Encapsulation ensures that each context operates independently, shielding its internal workings from external influences and interactions. This separation is essential for preventing unintended dependencies and conflicts between contexts, facilitating easier maintenance and scalability of the system. However, failing to maintain clear boundaries between contexts can have significant repercussions. Confusing the context of a park with that of a pizza shop, for example, can lead to a tangled web of interactions and dependencies, ultimately resulting in a chaotic and disorganized system. This confusion may manifest as a “big ball of mud,” where the system becomes difficult to understand, maintain, and evolve. Without clear delineation between contexts, the system risks becoming convoluted and unmanageable, impeding progress and hindering the achievement of business objectives. In essence, embracing the principle of “context is king” empowers teams to design software solutions that resonate with users, foster trust, and deliver value. By meticulously considering the specific context in which a system operates, teams can craft solutions that not only meet user needs but also exceed expectations. In the following sections, we’ll delve into practical strategies for identifying and defining bounded contexts, leveraging the collaborative power of event-storming to achieve clarity and alignment within complex systems. Understanding Bounded Contexts In our previous discussion, we emphasized the notion that “context is king,” highlighting the paramount importance of understanding context in software design. Now, armed with a clearer understanding of context, we can delve into bounded contexts (BC) and how they enhance our comprehension of system architecture. In the world of software design, bounded contexts act as delineations that separate various realms of understanding within a system. Visualize each bounded context as a distinct microcosm with its own unique language and regulations. To illustrate with another example, consider the term “bank” – it holds different meanings based on its context. It could signify a financial institution where you deposit and withdraw money, or it might refer to the sloping land alongside a body of water. In each scenario, the word “bank” carries a different significance because it operates within a different context. In software design, similar linguistic ambiguities can arise, where identical terms hold divergent meanings across different parts of a program. Bounded contexts provide clarity in such situations by establishing boundaries that dictate specific interpretations. They essentially say, “Within this segment of the program, ‘bank’ denotes a financial institution, while in this other segment, it signifies the land beside a river.” Within each bounded context, a common language prevails. It’s akin to everyone within that context speaking the same dialect, facilitating seamless communication and understanding. This shared vocabulary, known as “ubiquitous language,” permeates every aspect of the context, ensuring consistency and clarity in communication. Also, consistency reigns supreme within each bounded context. All components and interactions adhere to a set of predefined rules, fostering predictability and mitigating confusion. If, for instance, “bank” signifies a financial institution within a specific context, it consistently retains that meaning across all instances within that context, eliminating surprises or misunderstandings. Bounded contexts are demarcated by clear borders, ensuring that they remain distinct and separate from one another. Similar to delineated rooms within a house, each bounded context operates autonomously, with actions and events confined within its designated space. This segregation prevents overlap or interference between different contexts, maintaining order and coherence within the system. Within each bounded context, integrity is preserved by upholding its own set of rules and regulations. These rules remain immutable and unaffected by external influences, safeguarding the context’s autonomy and predictability. By maintaining the sanctity of its rules, each bounded context ensures organizational consistency and reliable system behavior. In essence, bounded contexts serve as organizational constructs within software design, delineating distinct spheres of meaning and operation. Through their delineation of language, consistency in rules, clear boundaries, and integrity, bounded contexts uphold clarity, coherence, and predictability within complex software systems. For a deeper exploration of bounded contexts and their significance in software design, we recommend checking out Nick Tune’s insightful presentation on the topic. Bounded contexts are the most important part of Domain Driven Design. Maintaining a strong decoupling between different bounded contexts makes large systems more simple. Bounded contexts serve as linguistic boundaries, delineating distinct areas of meaning and operation within a software system. Just as geographical boundaries demarcate territories, bounded contexts demarcate semantic territories where terms and concepts hold specific, well-defined meanings. Within each bounded context, the ubiquitous language reigns supreme, providing stakeholders with a common understanding of domain concepts and fostering effective communication. Consider the campervan rental service domain once more. Within the context of customer management, terms like “booking,” “customer profile,” and “loyalty program” hold nuanced meanings tailored to the needs of rental managers and customer service representatives. In contrast, within the context of fleet operations, these same terms may take on entirely different connotations, aligning with the concerns and priorities of fleet managers and maintenance crews. For example, in the context of “Customer Management,” the term “booking” refers to the process of reserving a campervan for a specific duration, including customer details and payment information. Conversely, within the context of “Fleet Operations,” “booking” may signify the allocation of a vehicle for rental, including maintenance schedules and availability status. By establishing bounded contexts that align with linguistic boundaries, teams can navigate the complex terrain of software development with confidence and clarity. Each bounded context encapsulates a cohesive set of domain concepts, providing stakeholders with a focused lens through which to analyze, design, and implement software solutions. Strategies for Identifying Bounded Contexts In our quest to understand these boundaries, we employ various strategies for detecting bounded contexts. Let’s delve into these tactics, each offering unique insights into the structure and organization of software systems: "Same term different meaning." In the process of identifying bounded contexts within our domain, it’s essential to recognize that the same term may have different meanings depending on the context in which it’s used. This phenomenon is common in complex systems where various stakeholders, departments, or external systems interact, each with their own interpretations and definitions. For instance, in our campervan rental service domain, the term “reservation” might have different meanings in different contexts. Within the context of the booking system, a reservation could refer to a customer’s request to reserve a campervan for specific dates. However, in the context of inventory management, a reservation could signify a campervan that has been set aside for a particular customer but has not yet been confirmed. Similarly, the term “availability” might have distinct interpretations depending on the context. In the context of customer inquiries, availability could indicate the current availability status of a campervan for a given period. Conversely, in the context of maintenance scheduling, availability might refer to the readiness of a campervan for rental after undergoing maintenance procedures. By acknowledging these differences in meaning and context, we can identify potential bounded contexts within our system. Each unique interpretation of a term or concept may indicate a separate area of the domain that requires its own bounded context to ensure clarity, consistency, and effective communication. Therefore, during our Event Storming session, it’s crucial to pay attention to these variations in terminology and meaning. Discussing and clarifying the semantics of domain concepts with stakeholders can help uncover different perspectives and potential bounded contexts within our system. This process of understanding the nuanced meanings of terms across different contexts is essential for accurately delineating boundaries and designing cohesive, well-defined contexts within our campervan rental service domain. “Same concept, different use” holds significant implications for identifying bounded contexts within a system. Consider our campervan rental service as an example: In the domain of customer management, the concept of “customer profile” serves as a repository of information about individual customers, facilitating personalized assistance and fostering long-term relationships. Meanwhile, within inventory management, the same “customer profile” concept is utilized to associate specific vehicles with individual bookings, optimizing fleet utilization and managing logistics efficiently. From a marketing and sales perspective, “customer profiles” are leveraged to analyze behavior, segment the customer base, and tailor marketing campaigns, focusing on driving sales and increasing customer engagement. In data analytics and reporting, “customer profiles” are instrumental in generating insights, tracking key performance indicators, and making data-driven recommendations to optimize business operations. Within the technical infrastructure, the concept of “customer profile” guides the design of data models, databases, and APIs, ensuring data integrity, security, and scalability while optimizing system performance. By understanding how the same concept is used differently across these domains, we can identify potential bounded contexts within the system. Each domain interprets and applies the concept according to its unique objectives and workflows, leading to the delineation of clear boundaries and the identification of distinct bounded contexts. This understanding is crucial for designing modular, cohesive systems that align with the needs of each domain while promoting interoperability and scalability. Conway’s Law Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations. In simpler terms, this means that the architecture of a software system tends to mirror the communication structures of the organization that builds it. This concept has significant implications for the design of bounded contexts in Domain-Driven Design (DDD). When applying Conway’s Law to bounded contexts, it suggests that the boundaries and interactions between different parts of a software system should align with the communication patterns and organizational boundaries within the development team or teams. Here’s how this principle applies to bounded contexts: Organizational boundaries: If an organization is divided into separate teams or departments, each responsible for different aspects of the software, these boundaries may naturally align with the boundaries of bounded contexts. Each team focuses on a specific area of functionality, defining its own bounded context with well-defined interfaces for interaction with other contexts. Communication structures: The communication patterns between teams or departments influence the interactions and dependencies between bounded contexts. Effective communication channels and collaboration mechanisms are essential for defining clear interfaces and ensuring seamless integration between contexts. Modularity and decoupling: Following Conway’s Law, designing bounded contexts that reflect the communication structures of the organization can promote modularity and decoupling within the system. Each bounded context encapsulates a cohesive set of functionality, reducing dependencies and allowing teams to work autonomously within their respective domains. Scalability and flexibility: By aligning bounded contexts with organizational communication structures, teams can scale their development efforts more effectively. Each team can focus on developing and maintaining a specific bounded context, enabling parallel development and deployment of independent components. This approach enhances flexibility and agility, allowing the system to evolve incrementally to meet changing business requirements. Consistency and cohesion: Conway’s Law underscores the importance of fostering effective communication and collaboration between teams to ensure consistency and cohesion across bounded contexts. Shared understanding, common goals, and aligned priorities are essential for maintaining coherence in the overall system architecture. In summary, Conway’s Law emphasizes the interconnectedness between organizational structure and software architecture. When designing bounded contexts in Domain-Driven Design, teams should consider the communication patterns, collaboration dynamics, and organizational boundaries to create modular, cohesive, and scalable systems that reflect the underlying structure of the organization. "External systems" can indeed serve as indicators of the boundaries of a context within our domain of campervan rental services. In our scenario, where our application interacts with various external systems, such as a weather service for updates, a payment gateway for transactions, and a mapping service for location information, each external system represents a distinct aspect of our application’s functionality. To manage these interactions effectively, we would likely define bounded contexts tailored to each external system. For example, we might have a context responsible for weather updates, another for payment processing, and yet another for mapping services. Each context would encapsulate the logic and functionality related to its respective external system, handling interactions, processing data, and managing relevant domain concepts. By identifying these external systems and their associated interactions, we can establish clear boundaries for each context within our system. These boundaries help organize our codebase, define clear responsibilities, and facilitate effective communication between different parts of the system. Furthermore, leveraging external systems as indicators of context boundaries enables us to design a more modular, maintainable, and scalable system architecture for our campervan rental service. With well-defined contexts tailored to specific external systems, we can ensure that each part of our application remains focused, cohesive, and easily manageable, leading to better overall system design and development. Identifying Bounded Contexts in Event Storming In the realm of Event Storming, identifying bounded contexts involves a meticulous process of analyzing domain events and uncovering patterns that delineate distinct areas of functionality within the system. One effective approach to initiating this process is by examining how domain events cluster together, forming groups that hint at cohesive bounded contexts. Grouping Domain Events Consider our campervan rental service domain. During an Event Storming session focused on this domain, we capture various domain events such as “Rent Campervan,” “Return Campervan,” “Schedule Maintenance,” and “Check Availability.” As we lay out these events on the board, we start to notice clusters forming. For example, events related to customer interactions may group together, while those pertaining to fleet management form another cluster. Proto-Bounded Contexts These clusters of domain events, often referred to as “proto-bounded contexts,” serve as early indicators of potential bounded contexts within the system. For instance, the group of events related to customer interactions may suggest a bounded context focused on “Customer Management,” while the cluster related to fleet management may indicate a separate bounded context for “Fleet Operations.” Analyzing Relationships As proto-bounded contexts begin to take shape, it becomes essential to analyze the relationships between different groups of domain events. For instance, how do events within the “Customer Management” bounded context interact with events in the “Fleet Operations” bounded context? Are there dependencies or interactions that need to be considered? To materialize these further, try the following: Ask the audience for a bounded context name. Tip: Look into names in “ing” for good ones (e.g., renting). It might also be the occasion to capture a few domain definitions. Be sure to keep your definition post-its at hand. Gather the necessary materials, including colored, thick wool string, scissors, and adhesive tape. With your volunteer, walk through the board from left to right, identifying bounded contexts. As you identify each bounded context, use the wool string to visually delineate its boundaries on the board. Engage in discussion as you go along. You will usually agree about bounded context boundaries but don’t hesitate to explore any areas of uncertainty or disagreement. This hands-on approach helps reinforce the understanding of bounded contexts by providing a tangible representation of their boundaries. By physically tracing the boundaries with wool string, participants gain a clearer visual understanding of how different areas of functionality within the system are encapsulated within distinct bounded contexts. Conclusion In conclusion, our journey through the intricacies of identifying bounded contexts in Event Storming has provided valuable insights into the art and science of system design. From understanding the nuanced meanings of domain events to recognizing patterns and clusters indicative of bounded contexts, we’ve explored various strategies for delineating clear boundaries within complex software systems. By leveraging Event Storming as a collaborative tool, teams can uncover hidden insights, facilitate meaningful discussions, and align on critical aspects of system architecture. Through hands-on exercises and interactive sessions, participants gain a deeper understanding of the domain, identify key areas of functionality, and define bounded contexts that encapsulate distinct domains within the system. Moreover, our exploration of bounded contexts within the context of Domain-Driven Design (DDD) has highlighted the pivotal role they play in fostering clarity, consistency, and modularity within software systems. By embracing the principle that “context is king,” teams can design solutions that resonate with users, align with business objectives, and adapt to evolving requirements. Additionally, it’s worth noting that for Persian-speaking audiences who seeking further insights and practical guidance on identifying Bounded Contexts, I’ve had the privilege of delivering a presentation on this topic. In this presentation, I delve deeper into the art of identifying bounded contexts at the Iran Domain Driven Design Community Conference. Thank you for joining me on this insightful journey, and until next time, happy Event Storming!
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, The Modern DevOps Lifecycle: Shifting CI/CD and Application Architectures. Forbes estimates that cloud budgets will break all previous records as businesses will spend over $1 trillion on cloud computing infrastructure in 2024. Since most application releases depend on cloud infrastructure, having good continuous integration and continuous delivery (CI/CD) pipelines and end-to-end observability becomes essential for ensuring highly available systems. By integrating observability tools in CI/CD pipelines, organizations can increase deployment frequency, minimize risks, and build highly available systems. Complementing these practices is site reliability engineering (SRE), a discipline ensuring system reliability, performance, and scalability. This article will help you understand the key concepts of observability and how to integrate observability in CI/CD for creating highly available systems. Observability and High Availability in SRE Observability refers to offering real-time insights into application performance, whereas high availability means ensuring systems remain operational by minimizing downtime. Understanding how the system behaves, performs, and responds to various conditions is central to achieving high availability. Observability equips SRE teams with the necessary tools to gain insights into a system's performance. Figure 1. Observability in the DevOps workflow Components of Observability Observability involves three essential components: Metrics – measurable data on various aspects of system performance and user experience Logs – detailed event information for post-incident reviews Traces – end-to-end visibility in complex architectures to help you understand requests across services Together, they comprehensively picture the system's behavior, performance, and interactions. This observability data can then be analyzed by SRE teams to make data-driven decisions and swiftly resolve issues to make their system highly available. The Role of Observability in High Availability Businesses have to ensure that their development and SRE teams are skilled at predicting and resolving system failures, unexpected traffic spikes, network issues, and software bugs to provide a smooth experience to their users. Observability is vital in assessing high availability by continuously monitoring specific metrics that are crucial for system health, such as latency, error rates, throughput, saturation, and more, therefore providing a real-time health check. Deviations from normal behavior trigger alerts, allowing SRE teams to proactively address potential issues before they impact availability. How Observability Helps SRE Teams Each observability component contributes unique insights into different facets of system performance. These components empower SRE teams to proactively monitor, diagnose, and optimize system behavior. Some use cases of metrics, logs, and traces for SRE teams are post-incident reviews, identification of system weaknesses, capacity planning, and performance optimization. Post-Incident Reviews Observability tools allow SRE teams to look at past data to analyze and understand system behavior during incidents, anomalies, or outages. Detailed logs, metrics, and traces provide a timeline of events that help identify the root causes of issues. Identification of System Weaknesses Observability data aids in pinpointing system weaknesses by providing insights into how the system behaves under various conditions. By analyzing metrics, logs, and traces, SRE teams can identify patterns or anomalies that may indicate vulnerabilities, performance bottlenecks, or areas prone to failures. Capacity Planning and Performance Optimization By collecting and analyzing metrics related to resource utilization, response times, and system throughput, SRE teams can make informed decisions about capacity requirements. This proactive approach ensures that systems are adequately scaled to handle expected workloads and their performance is optimized to meet user demands. In short, resources can be easily scaled down during non-peak hours or scaled up when demands surge. SRE Best Practices for Reliability At its core, SRE practices aim to create scalable and highly reliable software systems using two key principles that guide SRE teams: SRE golden signals and service-level objectives (SLOs). Understanding SRE Golden Signals The SRE golden signals are a set of critical metrics that provide a holistic view of a system's health and performance. The four primary golden signals are: Latency – Time taken for a system to respond to a request. High latency negatively impacts user experience. Traffic – Volume of requests a system is handling. Monitoring helps anticipate and respond to changing demands. Errors – Elevated error rates can indicate software bugs, infrastructure problems, or other issues that may impact reliability. Saturation – Utilization of system resources such as CPU, memory, or disk. It helps identify potential bottlenecks and ensures the system has sufficient resources to handle the load. Setting Effective SLOs SLOs define the target levels of reliability or performance that a service aims to achieve. They are typically expressed as a percentage over a specific time period. SRE teams use SLOs to set clear expectations for a system’s behavior, availability, and reliability. They continuously monitor the SRE golden signals to assess whether the system meets its SLOs. If the system falls below the defined SLOs, it triggers a reassessment of the service's architecture, capacity, or other aspects to improve availability. Businesses can use observability tools to set up alerts based on predetermined thresholds for key metrics. Defining Mitigation Strategies Automating repetitive tasks, such as configuration management, deployments, and scaling, reduces the risk of human error and improves system reliability. Introducing redundancy in critical components ensures that a failure in one area doesn't lead to a system-wide outage. This could involve redundant servers, data centers, or even cloud providers. Additionally, implementing rollback mechanisms for deployments allows SRE teams to quickly revert to a stable state in the event of issues introduced by new releases. CI/CD Pipelines for Zero Downtime Achieving zero downtime through effective CI/CD pipelines enables services to provide users with continuous access to the latest release. Let’s look at some of the key strategies employed to ensure zero downtime. Strategies for Designing Pipelines to Ensure Zero Downtime Some strategies for minimizing disruptions and maximizing user experience include blue-green deployments, canary releases, and feature toggles. Let’s look at them in more detail. Figure 2. Strategies for designing pipelines to ensure zero downtime Blue-Green Deployments Blue-green deployments involve maintaining two identical environments (blue and green), where only one actively serves production traffic at a time. When deploying updates, traffic is seamlessly switched from the current (blue) environment to the new (green) one. This approach ensures minimal downtime as the transition is instantaneous, allowing quick rollback in case issues arise. Canary Releases Canary releases involve deploying updates to a small subset of users before rolling them out to everyone. This gradual and controlled approach allows teams to monitor for potential issues in a real-world environment with reduced impact. The deployment is released to a wider audience if the canary group experiences no significant issues. Feature Toggles Feature toggles, or feature flags, enable developers to control the visibility of new features in production independently of other features. By toggling features on or off, teams can release code to production but activate or deactivate specific functionalities dynamically without deploying new code. This approach provides flexibility, allowing features to be gradually rolled out or rolled back without redeploying the entire application. Best Practices in CI/CD for Ensuring High Availability Successfully implementing CI/CD pipelines for high availability often requires a good deal of consideration and lots of trial and error. While there are many implementations, adhering to best practices can help you avoid common problems and improve your pipeline faster. Some industry best practices you can implement in your CI/CD pipeline to ensure zero downtime are automated testing, artifact versioning, and Infrastructure as Code (IaC). Automated Testing You can use comprehensive test suites — including unit tests, integration tests, and end-to-end tests — to identify potential issues early in the development process. Automated testing during integration provides confidence in the reliability of code changes, reducing the likelihood of introducing critical bugs during deployments. Artifact Versioning By assigning unique versions to artifacts, such as compiled binaries or deployable packages, teams can systematically track changes over time. This practice enables precise identification of specific code iterations, thus simplifying debugging, troubleshooting, and rollback processes. Versioning artifacts ensures traceability and facilitates rollback to previous versions in the case of issues during deployment. Infrastructure as Code Utilize Infrastructure as Code to define and manage infrastructure configurations, using tools such as OpenTofu, Ansible, Pulumi, Terraform, etc. IaC ensures consistency between development, testing, and production environments, reducing the risk of deployment-related issues. Integrating Observability Into CI/CD Pipelines Observing key metrics such as build success rates, deployment durations, and resource utilization during CI/CD provides visibility into the health and efficiency of the CI/CD pipeline. Observability can be implemented during continuous integration (CI) and continuous deployment (CD) as well as post-deployment. Observability in Continuous Integration Observability tools capture key metrics during the CI process, such as build success rates, test coverage, and code quality. These metrics provide immediate feedback on the health of the codebase. Logging enables the recording of events and activities during the CI process. Logs help developers and CI/CD administrators troubleshoot issues and understand the execution flow. Tracing tools provide insights into the execution path of CI tasks, allowing teams to identify bottlenecks or areas for optimization. Observability in Continuous Deployment Observability platforms monitor the CD pipeline in real time, tracking deployment success rates, deployment durations, and resource utilization. Observability tools integrate with deployment tools to capture data before, during, and after deployment. Alerts based on predefined thresholds or anomalies in CD metrics notify teams of potential issues, enabling quick intervention and minimizing the risk of deploying faulty code. Post-Deployment Observability Application performance monitoring tools provide insights into the performance of deployed applications, including response times, error rates, and transaction traces. This information is crucial for identifying and resolving issues introduced during and after deployment. Observability platforms with error-tracking capabilities help pinpoint and prioritize software bugs or issues arising from the deployed code. Aggregating logs from post-deployment environments allows for a comprehensive view of system behavior and facilitates troubleshooting and debugging. Conclusion The symbiotic relationship between observability and high availability is integral to meeting the demands of agile, user-centric development environments. With real-time monitoring, alerting, and post-deployment insights, observability plays a major role in achieving and maintaining high availability. Cloud providers are now leveraging drag-and-drop interfaces and natural language tools to eliminate the need for advanced technical skills for deployment and management of cloud infrastructure. Hence, it is easier than ever to create highly available systems by combining the powers of CI/CD and observability. Resources: Continuous Integration Patterns and Anti-Patterns by Nicolas Giron and Hicham Bouissoumer, DZone Refcard Continuous Delivery Patterns and Anti-Patterns by Nicolas Giron and Hicham Bouissoumer, DZone Refcard "The 10 Biggest Cloud Computing Trends In 2024 Everyone Must Be Ready For Now" by Bernard Marr, Forbes This is an excerpt from DZone's 2024 Trend Report,The Modern DevOps Lifecycle: Shifting CI/CD and Application Architectures.For more: Read the Report
Your team is supposed to use an Agile approach, such as Scrum. But you have a years-long backlog, your standups are individual status reports, and you’re still multitasking. You and your team members wish you had the chance to do great work, but this feels a lot like an “agile” death march. There’s a reason you feel that way. You’re using fake agility—a waterfall lifecycle masquerading as an agile approach. Worse, fake agility is the norm in our industry. Now, there is light at the end of the tunnel; let’s delve into Tackling Fake Agility with Johanna Rothman! Watch the video now: “Agile” Does Not Work for You? Tackling Fake Agility with Johanna Rothman at the 59th Hands-on Agile Meetup. Abstract: Tackling Fake Agility Your team is supposed to use an Agile approach, such as Scrum. But you have a years-long backlog, your standups are individual status reports, and you’re still multitasking. You and your team members wish you had the chance to do great work, but this feels a lot like an “agile” death march. There’s a reason you feel that way. You’re using fake agility—a waterfall lifecycle masquerading as an Agile approach. Worse, fake agility is the norm in our industry. No one has to work that way. Instead, you can assess your culture, project, and product risks to select a different approach. That will allow you to choose how to collaborate so you can iterate over features and when to deliver value. When you do, you are more likely to discover actual agility and an easier way to work. The learning objectives of Johanna’s session on Tackling Fake Agility were: Have a clear understanding of the different lifecycles and when to use each. Be able to assess your project, product, and portfolio risks. Know how to customize a lifecycle based on the unique culture and requirements of the team. How to create shorter feedback loops in any lifecycle for product success. Questions and Answers During the Q&A part on Tackling Fake Agility, Johanna answered the following questions, among others: How do we model risk? Possible approaches? How do we measure risk? Possible approaches? How do we model value? Possible approaches? How do we measure value? Possible approaches? No matter how we try to have teams work vertically, we get teams saying that they need a cohesive team or microservice team as they need to build things, and the others will build on top of them. What do you think? How can the organization measure the benefit of agility? In some software development teams, it seems natural to have the design and mock-up ready before the development, before the sprint planning, and QA done after, sometimes in the next sprint, and it seems to work for them better than doing all in the same Sprint. Why do Architecture and Requirements work in dedicated time ranges ahead of increments? Does that hold for other business analysis activities like a risk analysis? Based on your experience, what must we do to be valuable Agile coaches or consultants? Are there any cases in which using the cost of delay does not work, or would you not use it? Watch the recording of Johanna Rothman’s Tackling Fake Agility session now: Meet Johanna Rothman “People know me as the “Pragmatic Manager.” I offer frank advice—often with a little humor—for your tough problems. I help leaders and managers see and do reasonable things that work. Equipped with that knowledge, you can decide how to adapt your product development, always focusing on the business outcomes you need. My philosophy is that people want to do a good job. They don’t always know what they are supposed to do, nor how to do it.” Connect with Johanna Rothman Johanna’s Blog Johanna Rothman on LinkedIn
Product Ownership Is a Crucial Element in Improving Outcomes SAFe and Scrum both consider product ownership crucial to maximizing outcomes in environments of complexity and uncertainty. Teams are ideally organized around products/value streams so they can apply customer-centricity. Product people and their teams are ideally accountable for outcomes and are empowered to figure out, inspect, and adapt requirements/scope as needed. SAFe Has Multiple Product Ownership/Management Layers As organizations tackle bigger products, they have some alternatives for how to tackle product ownership/management. Scrum advises having one product owner for each product, even if multiple teams develop the product. This is at the core of scaling frameworks such as Nexus and LeSS. SAFe takes a path that is more aligned with the classic structure of product management organizations, which is to have multiple layers of product ownership/management. Product owners own the product at the Agile Team level. Product managers own product at the teams of Agile teams level (Agile Release Trains). Solution managers own products for huge teams of teams working on even larger products/solutions. Why Did SAFe Make This Choice? SAFe takes the perspective of learning from experience in the trenches and what patterns organizations are using and applying lean/Agile principles as needed to help organizations evolve. And many organizations have been struggling to scale product ownership when we're talking about multiple teams. Product management experts such as Melissa Perri also talk about multiple product management roles (see some thoughts about how this relates to SAFe below). Interestingly enough, Scrum@Scale also has product owners at every level. And LeSS/Nexus also introduce multiple product owners when you scale beyond a handful of teams. The advantage of this approach is that it aligns with the product manager/owner journey. Working closely with one or two teams, owning product choices for a couple of product features or a certain slice of the product, can be a great jumping point for junior product managers/owners (What Melissa Perri refers to as associate product managers in Escaping the Build Trap). As the product manager/owner gains experience, they can take on a whole product themselves. It takes time for a product owner/manager to gain the experience to act as the visionary entrepreneur for their product. They might start feeling more comfortable writing stories and executing experiments and, over time, learn to influence, design product experiments, and make tougher prioritization decisions with multiple demanding stakeholders. In other words, product managers/owners naturally evolve from focusing on tactics to strategy over time. What Are Some Downsides To Splitting Product Responsibilities Between the Product Owner and Product Manager? An anti-pattern we often see is that the PM/PO split allows an organization to staff the PO role with “story writers” and “project managers” — people who aren’t empowered as product owners, and that reinforce the project mindset of requirement order-taking and managing scope-budget-timeline. This lack of empowerment leads to delays and an environment where the team is focused on outputs rather than outcomes. Empowering product owners and their teams is a common challenge in SAFe AND Scrum. What I’ve seen work well is carving out an appropriate product scope within which the product owner and team are empowered to figure out what to build to achieve the desired outcomes and optimize the value of that product or that aspect of a bigger product. Doing this requires figuring out the product architecture and moving towards an empowering leadership style. As in many other areas, SAFe takes the evolutionary approach. If you’re a purist or a revolutionary, you’ll probably struggle with it. Real-world practitioners are more likely to relate to the evolutionary approach. It’s important to ensure that the PO/PM separation is not seen as an excuse to continue doing everything the same. Product Managers and Product Owners: A Collaborative Relationship Leaders implementing the PO/PM split should ensure healthy collaboration, involvement, and partnership across the product ownership/management team. Product managers should internalize the SAFe principles of unleashing the intrinsic motivation of knowledge workers, in this case, product owners. Product managers have a role as lean/Agile leaders to nurture the competence, awareness, and alignment in the product team that would enable them to decentralize control and let product owners OWN a certain slice of the product. Product managers and product owners should discuss what decisions make sense to centralize and which should be decentralized. The goal of product managers should be to grow product owners over time so they can make more and more decisions — and minimize the decisions that need to be made centrally. This is key to scaling without slowing down decision-making while maintaining and ideally improving outcomes aligned with strategic goals. Increased Risk of Misunderstandings Around Product Ownership With Product Roles Filled by Non-Product People One very specific risk of the SAFe choice to split the PM and PO roles is that it creates the need for many more people in a product role than the number of people in the product organization. This vacuum pulls people like business analysts, project managers, and development managers into the product owner role. Some people can become great product owners but come with very little product (management) experience. Business analysts, for example, are used to consider what the customers say as requirements. They are used to the “waiter” mindset. They struggle to say no or to think strategically about what should be in the future or what should be in the product. Development managers are used to being subject matter experts, guiding their people at the solution level, and managing the work. Project managers are used to focusing on managing scope/budget/timeline rather than value and outcomes. Use the Professional Scrum Product Ownership Stances to Improve your SAFe Product Ownership One technique I found very useful is to review the Professional Scrum Product Ownership Stances with SAFe product owners and product managers. We try to identify which misunderstood stances we’re exhibiting and what structures are reinforcing these misunderstood stances/behaviors. For example — what’s causing us to be “story writers”? We explore the preferred product owner stances and discuss what’s holding us back from being in these stances. Why is it so hard to be an “experimenter,” for example? An emerging realization from these conversations is that SAFe structurally creates a setup where team-level product owners play “story writers” and “subject matter experts” more often. It’s non-trivial to switch to an environment where they are a “customer representative” and a “collaborator” with the space to “experiment” with their team towards the right outcome rather than take requirements as a given. It’s hard to get SAFe product managers to be the “visionary,” “experimenter”, and “influencer”. The issue here isn’t unique to SAFe. Product owners struggle to exhibit these behaviors in most Scrum environments as well. What I find useful is to align on a “North Star” vision of what we WANT product ownership to look like at scale and take small steps in that direction continuously, rather than settle for “project management” or “business analysis” called a new name. SAFe Product Management: Providing Vision and Cohesion in a Pharma IT Environment Let’s close with an example of how this can play out in practice. I'm working with the IT organization of a pharmaceutical company. As they were thinking about how to help their Enterprise Applications group become more agile, one of the key questions was how do we create product teams that are empowered to directly support the business — by minimizing dependencies and creating real ownership of each of the enterprise applications as a platform that other IT functions can more easily build off of and leverage. Several Enterprise Applications have multiple teams working on different aspects of them. We created stream-aligned teams, each owning and managing that aspect as a product. The product owners and their teams are empowered to consider needs and wants coming in from other IT functions and the business and shape the future of their product. Most of these product ownership decisions happen at the team level. Product managers focus on alignment and cohesion across the platform. We are still working on establishing the right mechanisms to balance vision/alignment with local initiatives at the team level. So, Now What? SAFe’s approach to product ownership is a frequent target of criticism in the hard-core Agile community. Some of it is pure business/commercial posturing (aka FUD), and some of it is fair and constructive. My aim in this article was to help practitioners explore the rationale, the potential, and the risks behind SAFe’s approach to product ownership, as well as some patterns and models, such as the Professional Scrum Product Ownership stances, that can be used to inspect and adapt/grow the effectiveness of your product ownership approach. As an individual product owner or product manager, you can use these models/patterns to drive your learning journey and help you structure your organization's conversation around creating the environment that empowers you to be a real product owner or product manager. As leaders of product organizations in a SAFe environment, I hope this can help you establish a vision of how you want your product organization to look like and guide you on the way there.
Imagine entering a bustling workshop - not of whirring machines, but of minds collaborating. This is the true essence of software programming at its core: a collective effort where code serves not just as instructions for machines, but as a shared language among developers. However, unlike spoken languages, code can often become an obscure dialect, shrouded in complexity and inaccessible to newcomers. This is where the art of writing code for humans comes into play, transforming cryptic scripts into narratives that others can easily understand. After all, a primary group of users for our code are software engineers; those who are currently working with us or will work on our code in the future. This creates a shift in our software development mindset. Writing code just for the machines to understand and execute is not enough. It's necessary but not sufficient. If our code is easily human-readable and understandable then we've made a sufficient step towards manageable code complexity. This article focuses on how human-centric code can help towards manageable code complexity. There exist a number of best practices but they should be handled with careful thinking and consideration of our context. Finally, the jungle metaphor is used to explain some basic dynamics of code complexity. The Labyrinth of Complexity What is the nemesis of all human-readable code? Complexity. As projects evolve, features multiply, and lines of code snake across the screen, understanding becomes a daunting task. To combat this, developers wield a set of time-tested principles, their weapons against chaos. It is important to keep in mind that complexity is inevitable. It may be minimal complexity or high complexity, but one key takeaway here is that complexity creeps in, but it doesn't have to conquer our code. We must be vigilant and act early so that we can write code that keeps growing and not groaning. Slowing Down By applying good practices like modular design, clear naming conventions, proper documentation, and principles like those mentioned in the next paragraph, we can significantly mitigate the rate at which complexity increases. This makes code easier to understand, maintain, and modify, even as it grows. Breaking Down Complexity We can use techniques like refactoring and code reviews to identify and eliminate unnecessary complexity within existing codebases. This doesn't eliminate all complexity, but it can be significantly reduced. Choosing Better Tools and Approaches Newer programming languages and paradigms often focus on reducing complexity by design. For example, functional programming promotes immutability and modularity, which can lead to less intricate code structures. Complete Elimination of Complexity Slowing down code complexity is one thing, reducing it is another thing and completely eliminating it is something different that is rarely achievable in practice. Time-Tested Principles Below, we can find a sample of principles that may help our battle against complexity. It is by no means an exhaustive list, but it helps to make our point that context is king. While these principles offer valuable guidance, rigid adherence can sometimes backfire. Always consider the specific context of your project. Over-applying principles like Single Responsibility or Interface Segregation can lead to a bloated codebase that obscures core functionality. Don't Make Me Think Strive for code that reads naturally and requires minimal mental effort to grasp. Use clear logic and self-explanatory structures over overly convoluted designs. Make understanding the code as easy as possible for both yourself and others. Encapsulation Group related data and functionalities within classes or modules to promote data hiding and better organization. Loose Coupling Minimize dependencies between different parts of your codebase, making it easier to modify and test individual components. Separation of Concerns Divide your code into distinct layers (e.g., presentation, business logic, data access) for better maintainability and reusability. Readability Use meaningful names, consistent formatting, and comments to explain the "why" behind the code. Design Patterns (Wisely) Understand and apply these common solutions, but avoid forcing their use. For example, the SOLID principles can be summarised as follows: Single Responsibility Principle (SRP) Imagine a Swiss Army knife with a million tools. While cool, it's impractical. Similarly, code should focus on one well-defined task per class. This makes it easier to understand, maintain, and avoid unintended consequences when modifying the code. Open/Closed Principle (OCP) Think of LEGO bricks. You can build countless things without changing the individual bricks themselves. In software, OCP encourages adding new functionality through extensions, leaving the core code untouched. This keeps the code stable and adaptable. fbusin Substitution Principle (LSP) Imagine sending your friend to replace you at work. They might do things slightly differently, but they should fulfill the same role seamlessly. The LSP ensures that subtypes (inheritances) can seamlessly replace their base types without causing errors or unexpected behavior. Interface Segregation Principle (ISP) Imagine a remote with all buttons crammed together. Confusing, right? The ISP advocates for creating smaller, specialized interfaces instead of one giant one. This makes code clearer and easier to use, as different parts only interact with the functionalities they need. Dependency Inversion Principle (DIP) Picture relying on specific tools for every task. Impractical! DIP suggests depending on abstractions (interfaces) instead of concrete implementations. This allows you to easily swap out implementations without affecting the rest of the code, promoting flexibility and testability. Refactoring Regularly revisit and improve the codebase to enhance clarity and efficiency. Simplicity (KISS) Prioritize clear design, avoiding unnecessary features and over-engineering. DRY (Don't Repeat Yourself) Eliminate code duplication by using functions, classes, and modules. Documentation Write clear explanations for both code and software usage, aiding users and future developers. How Misuse Can Backfire While the listed principles aim for clarity and simplicity, their misapplication can lead to the opposite effect. Here are some examples. 1. Overdoing SOLID Strict SRP Imagine splitting a class with several well-defined responsibilities into multiple smaller classes, each handling a single, minuscule task. This can create unnecessary complexity with numerous classes and dependencies, hindering understanding. Obsessive OCP Implementing interfaces for every potential future extension, even for unlikely scenarios, may bloat the codebase with unused abstractions and complicate understanding the actual functionality. 2. Misusing Design Patterns Forced Factory Pattern Applying a factory pattern when simply creating objects directly makes sense, but can introduce unnecessary complexity and abstraction, especially in simpler projects. Overkill Singleton Using a singleton pattern for every service or utility class, even when unnecessary can create global state management issues and tightly coupled code. 3. Excessive Refactoring Refactoring Mania Constantly refactoring without a clear goal or justification can introduce churn, making the codebase unstable and harder to follow for other developers. Premature Optimization Optimizing code for potential future performance bottlenecks prematurely can create complex solutions that may never be needed, adding unnecessary overhead and reducing readability. 4. Misunderstood Encapsulation Data Fortress Overly restrictive encapsulation, hiding all internal data and methods behind complex accessors, can hinder understanding and make code harder to test and modify. 5. Ignoring Context Blindly Applying Principles Rigidly adhering to principles without considering the project's specific needs can lead to solutions that are overly complex and cumbersome for the given context. Remember The goal is to use these principles as guidelines, not strict rules. Simplicity and clarity are paramount, even if it means deviating from a principle in specific situations. Context is king: Adapt your approach based on the project's unique needs and complexity. By understanding these potential pitfalls and applying the principles judiciously, you can use them to write code that is both clear and efficient, avoiding the trap of over-engineering. The Importance of Human-Centric Code Regardless of the primary user, writing clear, understandable code benefits everyone involved. From faster collaboration and knowledge sharing to reduced maintenance and improved software quality. 1. Faster Collaboration and Knowledge Sharing Onboarding becomes a breeze: New developers can quickly grasp the code's structure and intent, reducing the time they spend deciphering cryptic logic. Knowledge flows freely: Clear code fosters open communication and collaboration within teams. Developers can easily share ideas, understand each other's contributions, and build upon previous work. Collective intelligence flourishes: When everyone understands the codebase, diverse perspectives and solutions can emerge, leading to more innovative and robust software. 2. Reduced Future Maintenance Costs Bug fixes become adventures, not nightmares: Debugging is significantly faster when the code is well-structured and easy to navigate. Developers can pinpoint issues quicker, reducing the time and resources spent on troubleshooting. Updates are a breeze, not a burden: Adding new features or modifying existing functionality becomes less daunting when the codebase is clear and understandable. This translates to lower maintenance costs and faster development cycles. Technical debt stays in check: Clear code makes it easier to refactor and improve the codebase over time, preventing technical debt from accumulating and hindering future progress. 3. Improved Overall Software Quality Fewer bugs, more smiles: Clear and well-structured code is less prone to errors, leading to more stable and reliable software. Sustainable projects, not ticking time bombs: Readable code is easier to maintain and evolve, ensuring the software's long-term viability and resilience. Happy developers, happy users: When developers can work on code they understand and enjoy, they're more productive and engaged, leading to better software and ultimately, happier users. Welcome to the Jungle Imagine a small garden, teeming with life and beauty. This is your software codebase, initially small and manageable. As features accumulate and functionality grows, the garden turns into an ever-expanding jungle. Vines of connections intertwine, and dense layers of logic sprout. Complexity, like the jungle, becomes inevitable. But just as skilled explorers can navigate the jungle, understanding its hidden pathways and navigating its obstacles, so too can developers manage code complexity. Again, if careless decisions are made in the jungle, we may endanger ourselves or make our lives miserable. Here are a few things that we can do in the jungle, being aware of what can go wrong: Clearing Paths Refactoring acts like pruning overgrown sections, removing unnecessary code, and streamlining logical flows. This creates well-defined paths, making it easier to traverse the code jungle. However, careless actions can make the situation worse. Overzealous pruning with refactoring might sever crucial connections, creating dead ends and further confusion. Clearing paths needs precision and careful consideration about what paths we need and why. Building Bridges Design patterns can serve as metaphorical bridges, spanning across complex sections and providing clear, standardized ways to access different functionalities. They offer familiar structures within the intricate wilderness. Beware though, that building bridges with ill-suited design patterns or ill-implemented patterns can lead to convoluted detours and hinder efficient navigation. Building bridges requires understanding what needs to be bridged, why, and how. Mapping the Terrain Documentation acts as a detailed map, charting the relationships between different parts of the code. By documenting code clearly, developers have a reference point to navigate the ever-growing jungle. Keep in mind that vague and incomplete documentation becomes a useless map, leaving developers lost in the wilderness. Mapping the terrain demands accuracy and attention to detail. Controlling Growth While the jungle may expand, strategic planning helps manage its complexity. Using modularization, like dividing the jungle into distinct biomes, keeps different functionalities organized and prevents tangled messes. Uncontrolled growth due to poor modularisation may result in code that is impossible to maintain. Controlling growth necessitates strategic foresight. By approaching these tasks with diligence, developers can ensure the code jungle remains explorable, understandable, and maintainable. With tools, mechanisms, and strategies tailored to our specific context and needs, developers can navigate the inevitable complexity. Now, think about the satisfaction of emerging from the dense jungle, having not just tamed it, but having used its complexities to your advantage. That's the true power of managing code complexity in software development. Wrapping Up While completely eliminating complexity might be unrealistic, we can significantly reduce the rate of growth and actively manage complexity through deliberate practices and thoughtful architecture. Ultimately, the goal is to strike a balance between functionality and maintainability. While complexity is unavoidable, it's crucial to implement strategies that prevent it from becoming an obstacle in software development.
Estimating workloads is crucial in mastering software development. This can be achieved either as an ongoing development part of agile teams or in response to tenders as a cost estimate before migration, among other ways. The team responsible for producing the estimate regularly encounters a considerable workload, which can lead to significant time consumption if the costing is not conducted using the correct methodology. The measurement figures generated may significantly differ based on the efficiency of the technique employed. Additionally, misconceptions regarding validity requirements and their extent exist. This paper presents a novel hybrid method for software cost estimation that discretizes software into smaller tasks and uses both expert judgment and algorithmic techniques. By using a two-factor qualification system based on volumetry and complexity, we present a more adaptive and scalable model for estimating software project duration, with particular emphasis on large legacy migration projects. Table Of Contents Introduction Survey of Existing SCE2.1. Algorithmic Methods2.2. Non-algorithmic Methods2.3. AI-based Methods2.4. Agile Estimation Techniques Hybrid Model Approach3.1. Discretization3.2. Dual-factor Qualification System and Effort Calculation Task3.3. Abacus System Specific Use Case in Large Legacy Migration Projects4.1. Importance of SCE in Legacy Migration4.2. Application of the Hybrid Model4.3. Results and Findings Conclusion Introduction Software Cost Estimation (SCE) is a systematic and quantitative process within the field of software engineering that involves analyzing, predicting, and allocating the financial, temporal, and resource investments required for the development, maintenance, and management of software systems. This vital effort uses different methods, models, and techniques to offer stakeholders knowledgeable evaluations of the expected financial, time, and resource requirements for successful software project execution. It is an essential part of project planning, allowing for a logical distribution of resources and supporting risk assessment and management during the software development life cycle. Survey of Existing SCE Algorithmic Methods COCOMO Within the field of software engineering and cost estimation, the Constructive Cost Model, commonly referred to as COCOMO, is a well-established and highly regarded concept. Developed by Dr Barry Boehm, COCOMO examines the interplay between software attributes and development costs. The model operates on a hierarchy of levels, ranging from basic to detailed, with each level providing varying degrees of granularity [1]. The model carefully uses factors such as lines of code and other project details, aligning them with empirical cost estimation data. Nonetheless, COCOMO is not a stagnant vestige of the past. It has progressed over the years, with COCOMO II encompassing the intricacies of contemporary software development practices, notably amid constantly evolving paradigms like object-oriented programming and agile methodologies [2]. However, though COCOMO’s empirical and methodical approach provides credibility, its use of lines of code as a primary metric attracts criticism. This is particularly true for projects where functional attributes are of greater importance. Function Point Analysis (FPA) Navigating away from the strict confines of code metrics, Function Point Analysis (FPA) emerges as a holistic method for evaluating software from a functional perspective. Introduced by Allan Albrecht at IBM in the late 1970s, FPA aims to measure software by its functionality and the value it provides to users, rather than the number of lines of code. By categorizing and evaluating different user features — such as inputs, outputs, inquiries, and interfaces — FPA simplifies software complexity into measurable function points [3]. This methodology is particularly effective in projects where the functional output is of greater importance than the underlying code. FPA, which takes a user-focused approach, aligns well with customer demands and offers a concrete metric that appeals to developers and stakeholders alike. However, it is important to note that the effectiveness of FPA depends on a thorough comprehension of user needs, and uncertainties could lead to discrepancies in estimation. SLIM (Software Life Cycle Management) Rooted in the philosophy of probabilistic modeling, SLIM — an acronym for Software Life Cycle Management — is a multifaceted tool designed by Lawrence Putnam [4]. SLIM’s essence revolves around a set of non-linear equations that, when woven together, trace the trajectory of software development projects from inception to completion. Leveraging a combination of historical data and project specifics, SLIM presents a probabilistic landscape that provides insights regarding project timelines, costs, and potential risks. What distinguishes SLIM is its capability to adapt and reconfigure as projects progress. By persistently absorbing project feedback, SLIM dynamically refines its estimates to ensure they remain grounded in project actualities. This continuous recalibration is both SLIM’s greatest asset and its primary obstacle. While it provides flexible adaptability, it also requires detailed data recording and tracking, which requires a disciplined approach from project teams. Non-Algorithmic Methods Expert Judgement Treading the venerable corridors of software estimation methodologies, one cannot overlook the enduring wisdom of Expert Judgment [5]. Avoiding the rigorous algorithms and formalities of other techniques, Expert Judgment instead draws upon the accumulated experience and intuitive prowess of industry veterans. These experienced practitioners, with their wealth of insights gathered from a multitude of projects, have an innate ability to assess the scope, intricacy, and possible difficulties of new ventures. Their nuanced comprehension can bridge gaps left by more strictly data-driven models. Expert Judgment captures the intangible subtleties of a project artfully, encapsulating the software development craft in ways quantitative metrics may overlook. However, like any art form, Expert Judgment is subject to the quirks of its practitioners. It is vulnerable to personal biases and the innate variability of human judgment. Analogous Estimation (or Historical Data) Historical Data estimation, also known as Analogous Estimation, is a technique used to inform estimates for future projects by reviewing past ones. It is akin to gazing in the rearview mirror to navigate the path ahead. This method involves extrapolating experiences and outcomes of similar previous projects and comparing them to the current one. By doing so, it provides a grounded perspective tempered by real-world outcomes to inform estimates. Its effectiveness rests on its empirical grounding, with past events often offering reliable predictors for future undertakings. Nevertheless, the quality and relevance of historical data at hand are crucial factors. A mismatched comparison or outdated data can lead projects astray, underscoring the importance of careful data curation and prudent implementation [6]. Delphi Technique The method draws its name from the ancient Oracle of Delphi, and it orchestrates a harmonious confluence of experts. The Delphi Technique is a method that aims to reach a consensus by gathering anonymous insights and projections from a group of experts [7]. This approach facilitates a symposium of collective wisdom rather than relying on a singular perspective. Through iterative rounds of feedback, the estimates are refined and recalibrated based on the collective input. The Delphi Technique is a structured yet dynamic process that filters out outliers and converges towards a more balanced, collective judgment. It is iterative in nature and emphasizes anonymity to curtail the potential pitfalls of groupthink and influential biases. This offers a milieu where each expert’s voice finds its rightful resonance. However, the Delphi Technique requires meticulous facilitation and patience, as it journeys through multiple rounds of deliberation before arriving at a consensus. AI-Based Methods Machine Learning in SCE Within the rapidly evolving landscape of Software Cost Estimation, Machine Learning (ML) emerges as a formidable harbinger of change [8]. Unshackling from the deterministic confines of traditional methods, ML delves into probabilistic realms, harnessing vast swaths of historical data to unearth hidden patterns and correlations. By training on diverse project datasets, ML algorithms refine their predictive prowess, adapting to nuances often overlooked by rigid, rule-based systems. This adaptability positions ML as a particularly potent tool in dynamic software ecosystems, where project scopes and challenges continually morph. However, the effectiveness of ML in SCE hinges on the quality and comprehensiveness of the training data. Sparse or biased datasets can lead the algorithms astray, underlining the importance of robust data curation and validation. Neural Networks Venturing deeper into the intricate neural pathways of computational modeling, Neural Networks (NN) stand as a testament to the biomimetic aspirations of artificial intelligence. Structured to mimic the neuronal intricacies of the human brain, NNs deploy layered architectures of nodes and connections to process and interpret information. In the realm of Software Cost Estimation, Neural Networks weave intricate patterns from historical data, capturing nonlinear relationships often elusive to traditional models [9], [10]. Their capacity for deep learning, especially with the advent of multi-layered architectures, holds immense promise for SCE’s complex datasets. Yet, the very depth that lends NNs their power can sometimes shroud them in opacity. Their "black box" nature, combined with susceptibility to overfitting, necessitates meticulous training and validation to ensure reliable estimations. Also, the recent discovery of ”Grokking” suggests that this field could yield fascinating new findings [11]. Genetic Algorithms Drawing inspiration from the very fabric of life, Genetic Algorithms (GAs) transpose the principles of evolution onto computational canvases. GAs approach Software Cost Estimation as an optimization puzzle, seeking the fittest solutions through processes mimicking natural selection, crossover, and mutation. By initiating with a diverse population of estimation strategies and iteratively refining them through evolutionary cycles, GAs converge towards more optimal estimation models. Their inherent adaptability and explorative nature make them well-suited for SCE landscapes riddled with local optima [12]. However, the stochastic essence of GAs means that their results, while generally robust, may not always guarantee absolute consistency across runs. Calibration of their evolutionary parameters remains crucial to strike a balance between exploration and exploitation. Agile Estimation Techniques Agile methodologies, originally formulated to address the challenges of traditional software development processes, introduced a paradigm shift in how projects are managed and products are delivered. Integral to this approach is the iterative nature of development and the emphasis on collaboration among cross-functional teams. This collaborative approach extends to the estimation processes in Agile. Instead of trying to foresee the entirety of a project’s complexity at its outset, Agile estimation techniques are designed to evolve, adapting as the team gathers more information. Story Points Instead of estimating tasks in hours or days, many Agile teams use story points to estimate the relative effort required for user stories. Story points consider the complexity, risk, and effort of the task. By focusing on relative effort rather than absolute time, teams avoid the pitfalls of under or over-estimating due to unforeseen challenges or dependencies. Over several iterations, teams develop a sense of their ”velocity” — the average number of story points they complete in an iteration — which aids in forecasting [13]. Planning Poker One of the most popular Agile estimation techniques is Planning Poker. Team members, often inclusive of developers, testers, and product owners, collaboratively estimate the effort required for specific tasks or user stories. Using a set of cards with pre-defined values (often Fibonacci sequence numbers), each member selects a card representing their estimate. After revealing their cards simultaneously, discrepancies in estimates are discussed, leading to consensus [14], [15]. The beauty of Planning Poker lies in its ability to combine individual expert opinions and arrive at an estimate that reflects the collective wisdom of the entire team. The process also uncovers potential challenges or uncertainties, leading to more informed decision-making. Continuous Reevaluation A hallmark of Agile estimation is its iterative nature. As teams proceed through sprints or iterations, they continually reassess and adjust their estimates based on new learnings and the actual effort expended in previous cycles. This iterative feedback loop allows for more accurate forecasting as the project progresses [14]. Hybrid Model Approach We aim to present a novel model that incorporates both expert judgment and algorithmic approaches. While considering the expert approach, it is worth noting that it may involve subjective evaluations, possibly exhibiting inconsistencies amongst different experts. Besides, its dependence on the experience and availability of experts has the potential to introduce biases due to cognitive heuristics and over-reliance on recent experiences. On the other hand, an algorithmic approach may require significant expertise to be applied correctly and may focus on certain parameters, such as the number of lines of code, which may not be relevant. Therefore, the aim here is to propose a model that is independent of the programming language and considers multiple factors, such as project, hardware, and personnel attributes. Task Discretization In the constantly evolving field of software engineering, the practice of task discretization has become firmly established as a mainstay [16]. This approach stresses the importance of breaking down larger software objectives into manageable, bite-sized units. By acknowledging the inherent discreetness of software components — from screens and APIs to SQL scripts — a methodical breakdown emerges as a practical requirement [17]. Such an approach allows you to define your software in consistent modules, composed of consistent elements. It is crucial to have homogeneous elements for estimation, to enable the estimation team to easily understand what they are estimating and avoid the need to adapt to the elements. Those elements will be referred to as ”tasks” throughout the paper. Also, addressing tasks at an individual level improves accuracy, while the detail it provides promotes flexibility, allowing for iterative adjustments that accommodate a project’s changing requirements. Such an approach guarantees that each component’s distinct characteristics are appropriately and independently considered. This method of discretization has several advantages. Addressing tasks at an individual level enhances accuracy, while the granularity it brings forth promotes flexibility, enabling iterative adjustments that accommodate a project’s fluid requirements. Conversely, a detailed comprehension of each task’s complexities enables the prudent allocation of resources and fine-tuning skills to where they are most necessary. Nevertheless, this level of detail is not without drawbacks. Despite providing accuracy, deconstructing tasks into their constituent parts may result in administrative challenges, particularly in extensive projects. The possibility of neglecting certain tasks, albeit minor, is ever-present. Moreover, an excessively detailed approach can sometimes obscure wider project aims, resulting in decision-making delays, which is often referred to as ”analysis paralysis”. Dual-Factor Qualification System and Effort Calculation Task Discretization With the delineation of tasks exhibiting homogeneous attributes, it becomes imperative to pinpoint generic determinants for allocating appropriate effort. Upon meticulous scrutiny, two pivotal factors have been discerned: Complexity and Volumetry [1], [18]. Complexity serves as a metric to gauge the requisite technical acumen for task execution. For instance, within a user interface, the incorporation of a dynamic table may warrant the classification of the task as possessing high complexity due to its intricate requirements. Volumetry delineates the volume or quantum of work involved. To illustrate, in the context of a user interface, the presence of an extensive forty-field form might be indicative of a task with significant volumetry due to the sheer magnitude of its components. Both Complexity and Volumetry are in the interval [1 − 5] and must be integers. Now we will define the Effort (E), which is calculated as follows: E = C ∗V Where C is the Complexity and V the Volumetry. We utilize multiplication in this calculation in order to establish a connection between high Complexity and high Volumetry. This enables us to account for potential risks when both evaluation criteria increase simultaneously while maintaining accuracy for tasks with lower coefficients. By using a simple product of the two intervals of C and V, we obtain the following possibilities for E: [1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 20, 25] Abacus System Now that an effort has been obtained, the corresponding number of days can be identified for each effort value. This stage is critical and requires the intervention of an expert with knowledge of the target architecture and technologies. However, the model permits this crucial resource to intervene only once when establishing these values. Use of an Algorithm To establish these values, we propose using an algorithm to enhance accuracy and prevent errors.It can be utilized to simulate data sets using three distinct models and two starting criteria: The maximal number of days (which is linked with an effort of 25) The gap padding between values We utilized three distinct models to enable the experts and estimation team to select from different curve profiles that may yield varied characteristics, such as precision, risk assessment, and padding size, for ideal adaptation to the requirements. Three distinct mathematical models were hypothesized to explicate the relationship: linear, quadratic, and exponential. Each model postulates a unique behavior of effort-to-days transformation: The Linear Model postulates a direct proportionality between effort and days. The Quadratic Model envisages an accelerated growth rate, invoking polynomial mathematics. The Exponential Model projects an exponential surge, signifying steep escalation for higher effort values. Those models can be adjusted to more accurately meet estimation requirements. Finally, we obtain the following code: Python import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing pandas for tabular display # Fixed effort values efforts = np.array([1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 20, 25]) # Parameters Max_days = 25 Step_Days = 0.25 def linear_model(effort, max_effort, max_days, step_days): slope = (max_days - step_days) / max_effort return slope * effort + step_days - slope def quadratic_model(effort, max_effort, max_days, step_days): scale = (max_days - step_days) / (max_effort + 0.05 * max_effort**2) return scale * (effort + 0.05 * effort**2) def exponential_model(effort, max_effort, max_days, step_days): adjusted_max_days = max_days - step_days + 1 base = np.exp(np.log(adjusted_max_days) / max_effort) return step_days + base ** effort - 1 def logarithmic_model(effort, max_effort, max_days, step_days): scale = (max_days - step_days) / np.log(max_effort + 1) return scale * np.log(effort + 1) # Rounding to nearest step def round_to_step(value, step): return round(value / step) * step linear_days = np.array([round_to_step(linear_model(e, efforts[-1], Max_days, Step_Days), Step_Days) for e in efforts]) quadratic_days = np.array([round_to_step(quadratic_model(e, efforts[-1], Max_days, Step_Days), Step_Days) for e in efforts]) exponential_days = np.array([round_to_step(exponential_model(e, efforts[-1], Max_days, Step_Days), Step_Days) for e in efforts]) logarithmic_days = np.array([round_to_step(logarithmic_model(e, efforts[-1], Max_days, Step_Days), Step_Days) for e in efforts]) # Plot plt.figure(figsize=(10,6)) plt.plot(efforts, linear_days, label="Linear Model", marker='o') plt.plot(efforts, quadratic_days, label="Quadratic Model", marker='x') plt.plot(efforts, exponential_days, label="Exponential Model", marker='.') plt.plot(efforts, logarithmic_days, label="Logarithmic Model", marker='+') plt.xlabel("Effort") plt.ylabel("Days") plt.title("Effort to Days Estimation Models") plt.legend() plt.grid(True) plt.show() # Displaying data in table format df = pd.DataFrame({ 'Effort': efforts, 'Linear Model (Days)': linear_days, 'Quadratic Model (Days)': quadratic_days, 'Exponential Model (Days)': exponential_days, 'Logarithmic Model (Days)': logarithmic_days }) display(df) Listing 1. Days generation model code, Python Simulations Let us now examine a practical example of chart generation. As previously stated in the code, the essential parameters ”Step Days” and ”Max days” have been set to 0.25 and 25, respectively. The results generated by the three models using these parameters are presented below. Figure 1: Effort to days estimation models - Data Below is a graphical representation of these results: Figure 2: Effort to days estimation models — graphical representation The graph enables us to distinguish the variation in ”compressions” amongst the three models, which will yield distinct traits, including accuracy in minimal forces or strong association among values. Specific Use Case in Large Legacy Migration Projects Now that the model has been described, a specific application will be proposed in the context of a migration project. It is believed that this model is well-suited to projects of this kind, where teams are confronted with a situation that appears unsuited to the existing standard model, as explained in the first part. Importance of SCE in Legacy Migration Often, migration projects are influenced by their cost. The need to migrate is typically caused by factors including: More frequent regressions and side effects Difficulty in locating new resources for outdated technologies Specialist knowledge concentration Complexity in integrating new features Performance issues All potential causes listed above increase cost and/or risk. It may be necessary to consider the migration of the problematic technological building block(s). Implementation depends mainly on the cost incurred, necessitating an accurate estimate [19]. However, it is important to acknowledge that during an organization’s migration process, technical changes must be accompanied by human and organizational adjustments. Frequently, after defining the target architecture and technologies, the organization might lack the necessary experts in these fields. This can complicate the ”Expert Judgement” approach. Algorithmic approaches do not appear to be suitable either, as they necessitate knowledge and mastery but also do not necessarily consider all the subtleties that migrations may require in terms of redrawing the components to be migrated. Additionally, the number of initial lines of code is not consistently a reliable criterion. Finally, AI-based methodologies seem to still be in their formative stages and may be challenging to implement and master for these organizations. That is why our model appears suitable, as it enables present teams to quantify the effort and then seek advice from an expert in the target technologies to create the estimate, thus obtaining an accurate figure. It is worth noting that this estimation merely encompasses the development itself and disregards the specification stages and associated infrastructurecosts. Application of the Hybrid Model We shall outline the entire procedure for implementing our estimation model. The process comprises three phases: Initialization, Estimation, and Finalization. Initialization During this phase, the technology building block to be estimated must first be deconstructed. It needs to be broken down into sets of unified tasks. For example, an application with a GWT front-end calling an AS400 database could be broken down into two main sets: Frontend: Tasks are represented by screens. Backend: Tasks are represented by APIs. We can then put together the estimation team. It does not need to be a technical expert in the target technology but should be made up of resources from the existing project, preferably a technical/functional pair, who can assess the complementarity of each task with the two visions. This team will be able to start listing the tasks for the main assemblies identified during the discretization process. Estimation We now have a team ready to assign Complexity and Volumetry values to the set of tasks to be identified. In parallel with this association work, we can begin to set values for the days to be associated with the effort. This work may require an expert in the target technologies and also members of the estimation team to quantify some benchmark values on the basis of which the expert can take a critical look and extend the results to the whole chart. At the end of this phase, we have a days/effort correspondence abacus and a list of tasks with an associated effort value. Finalization The final step is to calculate the conversion between effort and days using the abacus to obtain a total number of days. Once the list of effort values has been obtained, a risk analysis canbe carried out using the following criteria: The standard deviation of the probability density curve of efforts Analysis of whether certain ”zones” of the components concentrate high effort values The number of tasks with an effort value greater than 16 Depending on these criteria, specific measures can be taken in restricted areas. Results and Findings Finally, we arrive at the following process, which provides a hybrid formalization between expert judgment and algorithmic analysis. The method seems particularly well suited to the needs of migration projects, drawing on accessible resources and not requiring a high level of expertise. Figure 3: Complete process of the hybrid model Another representation, based on the nature of elements, could be the following: Figure 4: Complete process of the hybrid model Conclusion In conclusion, our model presents a practical and flexible approach for estimating the costs involved in large legacy migration projects. By combining elements of expert judgment with a structured, algorithmic analysis, this model addresses the unique challenges that come with migrating outdated or complex systems. It recognizes the importance of accurately gauging the effort and costs, considering not just the technical aspects but also the human and organizational shifts required. The three-phase process — Initialization, Estimation, and Finalization — ensures a comprehensive evaluation, from breaking down the project into manageable tasks to conducting a detailed risk analysis. This hybrid model is especially beneficial for teams facing the daunting task of migration, providing a pathway to make informed decisions and prepare effectively for the transition. Through this approach, organizations can navigate the intricacies of migration, ensuring a smoother transition to modern, more efficient systems. In light of the presented discussions and findings, it becomes evident that legacy migration projects present a unique set of challenges that can’t be addressed by conventional software cost estimation methods alone. The hybrid model as proposed serves as a promising bridge between the more heuristic expert judgment approach and the more structured algorithmic analysis, offering a balanced and adaptive solution. The primary strength of this model lies in its adaptability and its capacity to leverage both institutional knowledge and specific expertise in target technologies. Furthermore, the model’s ability to deconstruct a problem into sets of unified tasks and estimate with an appropriate level of granularity ensures its relevance across a variety of application scenarios. While the current implementation of the hybrid model shows potential, future research and improvements can drive its utility even further: Empirical validation: As with all models, empirical validation on a diverse set of migration projects is crucial. This would not only validate its effectiveness but also refine its accuracy.(We are already working on it.) Integration with AI: Although AI-based methodologies for software cost estimation are still nascent, their potential cannot be overlooked. Future iterations of the hybrid model could integrate machine learning for enhanced predictions, especially when large datasets from past projects are available. Improved risk analysis: The proposed risk analysis criteria provide a solid starting point. However, more sophisticated risk models, which factor in unforeseen complexities and uncertainties inherent to migration projects, could be integrated into the model. Tooling and automation: Developing tools that can semiautomate the process described would make the model more accessible and easier to adopt by organizations. In conclusion, the hybrid model presents a notable advancement in the realm of software cost estimation, especially for legacy migration projects. However, as with all models, it’s an evolving entity, and continued refinement will only enhance its applicability and effectiveness. References [1] Barry W. Boehm. Software engineering economics. IEEE Transactions on Software Engineering, SE-7(1):4–21, 1981. [2] Barry W. Boehm, Chris Abts, A. Winsor Brown, Sunita Chulani, Bradford K. Clark, Ellis Horowitz, Ray Madachy, Donald J. Reifer, and Bert Steece. Cost models for future software life cycle processes: Cocomo 2.0. Annals of Software Engineering, 1(1):57–94, 2000. [3] International Function Point Users Group (IFPUG). Function Point Counting Practices Manual. IFPUG, 2000. FPCPM. [4] L.H. Putnam. A general empirical solution to the macro software sizing and estimating problem. IEEE Transactions on Software Engineering, 4:345–361, 1978. [5] R.T. Hughes. Expert judgment as an estimating method. Information and Software Technology, 38(2):67–75, 1996. [6] Christopher Rush and Rajkumar Roy. Expert judgment in cost estimating: Modelling the reasoning process. Unknown Journal Name. [7] N. Dalkey. An experimental study of group opinion: the Delphi method. Futures, 1(5):408–426, 1969. [8] Yibeltal Assefa, Fekerte Berhanu, Asnakech Tilahun, and Esubalew Alemneh. Software effort estimation using machine learning algorithm. In 2022 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), pages 163–168, 2022. [9] A. Venkatachalam. Software cost estimation using artificial neural networks. In Proc. Int. Conf. Neural Netw. (IJCNN-93-Nagoya Japan), volume 1, pages 987–990, Oct 1993. [10] R. Poonam and S. Jain. Enhanced software effort estimation using multi-layered feed forward artificial neural network technique. Procedia Computer Science, 89:307–312, 2016. [11] Alethea Power, Yuri Burda, Harri Edwards, et al. Grokking: Generalization beyond overfitting on small algorithmic datasets. arXiv preprint arXiv:2201.02177, 2022. [12] B.K. Singh and A.K. Misra. Software effort estimation by genetic algorithm tuned parameters of modified constructive cost model for NASA software projects. International Journal of Computer Applications, 59:22–26, 2012. [13] K. Hrvoje and S. Gotovac. Estimating software development effort using Bayesian networks. In 2015 23rd International Conference on Software, Telecommunications and Computer Networks, pages 229–233, Split, Croatia, September 16–18 2015. [14] M. Cohn. Agile Estimating and Planning. Prentice Hall PTR, 2005. [15] Saurabh Bilgaiyan, Santwana Sagnika, Samaresh Mishra, and Madhabananda Das. A systematic review on software cost estimation in agile software development. Journal of Engineering Science and Technology Review, 10(4):51–64, 2017. [16] S. McConnell. Software Estimation: Demystifying the Black Art. Microsoft Press, 2006. [17] C. Szyperski. Component Software: Beyond Object-Oriented Programming. Addison-Wesley, 2nd edition, 2002. [18] N. E. Fenton and S. L. Pfleeger. Software Metrics: A Rigorous and Practical Approach. PWS Publishing Co., 1997. [19] Harry M. Sneed and Chris Verhoef. Cost-driven software migration: An experience report. Software: Practice and Experience, 2020.
In this fascinating talk, Michael Lloyd introduced the concept of dysfunction mapping, a tool developed over years of trial and error aimed at creating a repeatable way to find, theme, and ultimately solve organizational dysfunction. Abstract Dysfunction mapping is a tool developed over years of trial and error, aimed at creating a repeatable way to find, theme, and ultimately, solve organizational dysfunction. By following these steps, you can more quickly identify the biggest wins, develop a solid action plan, and measure if you’re really achieving outcomes that matter. It’s not a silver bullet, but it can give you some structure to creatively solve problems while also making your value visible and your goals clear. During the Q&A part on humble planning, Maarten answered the following questions, among others: What about other common iterative change management strategies? Which have you tried, and why have they failed for you? Why do you think they failed in your experience? Curious why the mapping moves from Right to Left (+2) A system diagram of influencing factors often contains reinforcing loops – how do we make that work with dysfunction trees that do not allow for loops? Do you worry when you have identified a “dysfunction” that you could go deeper into and perhaps get more possibilities? I am thinking of “The Dangers of the 5 Whys” and the advantages of systems thinking instead of reductionist cause-effect models. This seems focused on practices/processes. What is your approach to the foundational issues of mindset, values, behavior, and culture? Are the symptom cards available on the Honest Agile website? How do you define a purpose statement for dysfunctions? How do we get the team’s buy-in for these statements? How do you prioritize what to pick up first/next? I am curious about the clustering of symptoms and finding the real causes. Is it a typical root cause analysis? Do you identify impacts or the symptoms/dysfunctions or do anything else to understand how to prioritize which to address? Understanding the impact seems to be a missing piece. Are you sharing all your written-down symptoms with the team? “This all is what you are doing wrong. . . " The mindset-values-principles-practices slide could be interpreted as there is only one mindset. Do you think the Scrum anti-patterns Stefan identified could automatically fill your diagram? How do we order the implementation of the solutions? Is it, in your opinion, reasonable to include the DM in the OKRs? Symptom metrics are mostly output metrics? Could we also measure against the root cause? How do you create a single Sprint/cycle goal when doing Kanban with, e.g., 4 product goals? Video: Dysfunction Mapping With Michael Lloyd Meet Michael Lloyd Michael serves as a distinguished Agile Coach, Scrum Master, and authentic leader, boasting eight years of expertise in enhancing team and organizational performance to increase value delivery frequency. As the Head of Global Agility at Honest Agile, his mission is to influence agile practices globally by assisting agile practitioners in addressing actual challenges. Connect with Michael Lloyd on LinkedIn.
Platform engineering is the creation and management of foundational infrastructure and automated processes, incorporating principles like abstraction, automation, and self-service, to empower development teams, optimize resource utilization, ensure security, and foster collaboration for efficient and scalable software development. In today's fast-paced world of software development, the evolution of "platform engineering" stands as a transformative force, reshaping the landscape of software creation and management. This comprehensive exploration aims to demystify the intricate realm of platform engineering, shedding light on its fundamental principles, multifaceted functions, and its pivotal role in revolutionizing streamlined development processes across industries. Key Concepts and Principles Platform engineering encompasses several key concepts and principles that underpin the design and implementation of internal platforms. One fundamental concept is abstraction, which involves shielding developers from the complexities of underlying infrastructure through well-defined interfaces. Automation is another crucial principle, emphasizing the use of scripting and tools to streamline repetitive tasks, enhance efficiency, and maintain consistency in development processes. Self-service is pivotal, empowering development teams to independently provision and manage resources. Scalability ensures that platforms can efficiently adapt to varying workloads, while resilience focuses on the system's ability to recover from failures. Modularity encourages breaking down complex systems into independent components, fostering flexibility and reusability. Consistency promotes uniformity in deployment and configuration, aiding troubleshooting and stability. API-first design prioritizes the development of robust interfaces, and observability ensures real-time monitoring and traceability. Lastly, security by design emphasizes integrating security measures throughout the entire development lifecycle, reinforcing the importance of a proactive approach to cybersecurity. Together, these concepts and principles guide the creation of robust, scalable, and developer-friendly internal platforms, aligning with the evolving needs of modern software development. Diving Into the Role of a Platform Engineering Team The platform engineering team operates at the intersection of software development, operational efficiency, and infrastructure management. Their primary objective revolves around sculpting scalable and efficient internal platforms that empower developers. Leveraging automation, orchestration, and innovative tooling, these teams create standardized environments for application deployment and management, catalyzing productivity and performance. Image source Elaborating further on the team's responsibilities, it's essential to highlight their continuous efforts in optimizing resource utilization, ensuring security and compliance, and establishing robust monitoring and logging mechanisms. Their role extends beyond infrastructure provisioning, encompassing the facilitation of collaboration among development, operations, and security teams to achieve a cohesive and agile software development ecosystem. Building Blocks of Internal Platforms Central to platform engineering is the concept of an Internal Developer Platform (IDP) - a tailored environment equipped with an array of tools, services, and APIs. This environment streamlines the development lifecycle, offering self-service capabilities that enable developers to expedite the build, test, deployment, and monitoring of applications. Internal platforms in the context of platform engineering encompass various components that work together to provide a unified and efficient environment for the development, deployment, and management of applications. The specific components may vary depending on the platform's design and purpose, but here are some common components: Infrastructure as Code (IaC) Containerization and orchestration Service mesh API Gateway CI/CD pipelines Monitoring and logging Security components Database and data storage Configuration management Workflow orchestration Developer tools Policy and governance Benefits of Internal Platforms Internal platforms in platform engineering offer a plethora of benefits, transforming the software development landscape within organizations. These platforms streamline and accelerate the development process by providing self-service capabilities, enabling teams to independently provision resources and reducing dependencies on dedicated operations teams. Automation through CI/CD pipelines enhances efficiency and ensures consistent, error-free deployments. Internal platforms promote scalability, allowing organizations to adapt to changing workloads and demands. The modularity of these platforms facilitates code reusability, reducing development time and effort. By abstracting underlying infrastructure complexities, internal platforms empower developers to focus on building applications rather than managing infrastructure. Collaboration is enhanced through centralized tools, fostering communication and knowledge sharing. Additionally, internal platforms contribute to improved system reliability, resilience, and observability, enabling organizations to deliver high-quality, secure software at a faster pace. Overall, these benefits make internal platforms indispensable for organizations aiming to stay agile and competitive in the ever-evolving landscape of modern software development. Challenges in Platform Engineering Platform engineering, while offering numerous benefits, presents a set of challenges that organizations must navigate. Scalability issues can arise as the demand for resources fluctuates, requiring careful design and management to ensure platforms can efficiently scale. Maintaining a balance between modularity and interdependence poses a challenge, as breaking down systems into smaller components can lead to complexity and potential integration challenges. Compatibility concerns may emerge when integrating diverse technologies, requiring meticulous planning to ensure seamless interactions. Cultural shifts within organizations may be necessary to align teams with the principles of platform engineering, and skill gaps may arise, necessitating training initiatives. Additionally, achieving consistency across distributed components and services can be challenging, impacting the reliability and predictability of the platform. Balancing security measures without hindering development speed is an ongoing challenge, and addressing these challenges demands a holistic and strategic approach to platform engineering that considers technical, organizational, and cultural aspects. Implementation Strategies in Platform Engineering Following are the top five implementation strategies: Start small and scale gradually: Begin with a focused and manageable scope, such as a pilot project or a specific team. This allows for the identification and resolution of any initial challenges in a controlled environment. Once the initial implementation proves successful, gradually scale the platform across the organization. Invest in training and skill development: Provide comprehensive training programs to ensure that development and operations teams are well-versed in the tools, processes, and concepts associated with platform engineering. Investing in skill development ensures that teams can effectively utilize the platform and maximize its benefits. Automate key processes with CI/CD: Implement Continuous Integration (CI) and Continuous Deployment (CD) pipelines to automate crucial aspects of the development lifecycle, including code building, testing, and deployment. Automation accelerates development cycles, reduces errors, and enhances overall efficiency. Cultivate DevOps practices: Embrace DevOps practices that foster collaboration and communication between development and operations teams. promotes shared responsibility, collaboration, and a holistic approach to software development, aligning with the principles of platform engineering. Iterative improvements based on feedback: Establish a feedback loop to gather insights and feedback from users and stakeholders. Regularly review performance metrics, user experiences, and any challenges faced during the implementation. Use this feedback to iteratively improve the platform, addressing issues and continuously enhancing its capabilities. These top five strategies emphasize a phased and iterative approach, coupled with a strong focus on skill development, automation, and collaborative practices. Starting small, investing in training, and embracing a DevOps culture contribute to the successful implementation and ongoing optimization of platform engineering practices within an organization. Platform Engineering Tools Various tools aid platform engineering teams in building, maintaining, and optimizing platforms. Examples include: Backstage: Developed by Spotify, it offers a unified interface for accessing essential tools and services. Kratix: An open-source tool designed for infrastructure management and streamlining development processes Crossplane: An open-source tool automating infrastructure via declarative APIs, supporting tailored platform solutions Humanitec: A comprehensive platform engineering tool facilitating easy platform building, deployment, and management Port: A platform enabling the building of developer platforms with a rich software catalog and role-based access control Case Studies of Platform Engineering Spotify Spotify is known for its adoption of a platform model to empower development teams. They use a platform called "Backstage," which acts as an internal developer portal. Backstage provides a centralized location for engineers to discover, share, and reuse services, tools, and documentation. It streamlines development processes, encourages collaboration, and improves visibility into the technology stack. Netflix Netflix is a pioneer in adopting a microservices architecture and has developed an internal platform called the Netflix Internal Platform Engineering (NIPE). The platform enables rapid application deployment, facilitates service discovery, and incorporates fault tolerance. Uber Uber has implemented an internal platform called "Michelangelo" to streamline machine learning (ML) workflows. Michelangelo provides tools and infrastructure to support end-to-end ML development, from data processing to model deployment. Salesforce Salesforce has developed an internal platform known as "Salesforce Lightning Platform." This platform enables the creation of custom applications and integrates with the Salesforce ecosystem. It emphasizes low-code development, allowing users to build applications with minimal coding, accelerating the development process, and empowering a broader range of users. Distinguishing Platform Engineering From SRE While both platform engineering and Site Reliability Engineering (SRE) share goals of ensuring system reliability and scalability, they diverge in focus and approach. Platform engineering centers on crafting foundational infrastructure and tools for development, emphasizing the establishment of internal platforms that empower developers. In contrast, SRE focuses on operational excellence, managing system reliability, incident response, and ensuring the overall reliability, availability, and performance of production systems. Further Reading: Top 10 Open Source Projects for SREs and DevOps Engineers. ACTORS Platform Engineering SRE Scope Focused on creating a development-friendly platform and environment. Focused on reliability and performance of applications and services in production. Responsibilities Platform Engineers design and maintain internal platforms, emphasizing tools and services for development teams. SREs focus on operational aspects, automating tasks, and ensuring the resilience and reliability of production systems. Abstraction Level Platform Engineering abstracts infrastructure complexities for developers, providing a high-level platform. SRE deals with lower-level infrastructure details, ensuring the reliability of the production environment. DevOps vs Platform Engineering DevOps and platform engineering are distinct methodologies addressing different aspects of software development. DevOps focuses on collaboration and automation across the entire software delivery lifecycle, while platform engineering concentrates on providing a unified and standardized platform for developers. The table below outlines the differences between DevOps and platform engineering. Factors DevOps Platform Engineering Objective Streamline development and operations Provide a unified and standardized platform for developers Principles Collaboration, Automation, CI, CD Enable collaboration, Platform as a Product, Abstraction, Standardization, Automation Scope Extends to the entire software delivery lifecycle Foster collaboration between dev and ops teams, providing a consistent environment for the entire lifecycle Tools Uses a wide range of tools at different stages in the lifecycle Integrates a diverse set of tools into the platform Benefits Faster development & deployment cycles, higher collaboration Efficient and streamlined development environment, improved productivity, and flexibility for developers Future Trends in Platform Engineering Multi-cloud and hybrid platforms: Platform engineering is expected to focus on providing solutions that seamlessly integrate and manage applications across different cloud providers and on-premises environments. Edge computing platforms: Platforms will need to address challenges related to latency, connectivity, and management of applications deployed closer to end-users. AI-driven automation: The integration of artificial intelligence (AI) and machine learning (ML) into platform engineering is expected to increase. AI-driven automation can optimize resource allocation, improve predictive analytics for performance monitoring, and enhance security measures within platforms. Serverless architectures: Serverless computing is anticipated to become more prevalent, leading to platform engineering solutions that support serverless architectures. This trend focuses on abstracting server management, allowing developers to focus solely on writing code. Observability and AIOps: Observability, including monitoring, tracing, and logging, will continue to be a key focus. AIOps (Artificial Intelligence for IT Operations) will likely play a role in automating responses to incidents and predicting potential issues within platforms. Low-code/no-code platforms: The rise of low-code/no-code platforms is likely to influence platform engineering, enabling a broader range of users to participate in application development with minimal coding. Platform engineering will need to support and integrate with these development approaches. Quantum computing integration: As quantum computing progresses, platform engineering may need to adapt to support the unique challenges and opportunities presented by quantum applications and algorithms. Zero Trust Security: Zero Trust Security models are becoming increasingly important. Future platform engineering will likely focus on implementing and enhancing security measures at every level, considering the principles of zero trust in infrastructure and application security.
Stefan Wolpers
Agile Coach,
Berlin Product People GmbH
Daniel Stori
Software Development Manager,
AWS
Alireza Rahmani Khalili
Officially Certified Senior Software Engineer, Domain Driven Design Practitioner,
Worksome