Site Reliability Engineer

Morgan StanleyAlpharetta, GA
6d

About The Position

In the Technology division, we leverage innovation to build the connections and capabilities that power our Firm, enabling our clients and colleagues to redefine markets and shape the future of our communities. This is a Lead Software Production Management & Reliability Engineering position at Director level which is part of the job family responsible for overseeing the production environment, ensuring the operational reliability of deployed software, and implementing strategies to optimize performance and minimize downtime. Since 1935, Morgan Stanley is known as a global leader in financial services, always evolving and innovating to better serve our clients and our communities in more than 40 countries around the world. Department Profile Services Technology is a division within Wealth Management Technology that enhances technology solutions to improve business processes and service delivery. The division leverages a range of tools and technologies to automate processes, increase efficiency, and improve the effectiveness of business services. The Services Technology organization delivers platforms that support core client and advisor experiences. Our teams build and maintain solutions for the Contact Center, digital business automation, workflow orchestration, CRM & Salesforce, and client reporting. Our systems are designed to be innovative and resilient, helping to serve clients more efficiently and enabling seamless operations across the business. Job Summary We are looking for a Site Reliability Engineer with a minimum of 5 years of industry experience, preferably working in the financial IT community. The position in the WM Product Technology team is focused on delivering exceptional services to both BU and Dev partners to minimize/avoid any production outages. The role will focus on production support within the WM Product Technology automating deployments and working with the agile teams to build and support stable and reliable production systems. The ideal candidate will be passionate about automation and skilled in one of the programming language Python/PERL/SHELL, Ruby, JAVA, C# or the like. Candidate should possess a strong understanding of database concepts, job scheduler, MQ, Web services, UNIX/LINUX/Windows OS as well as experience with debugging applications. We are looking for a strong leader with excellent communications skills who is committed to continuously improving and delivering results. Candidate should be organized, disciplined, detail-oriented, self-motivated, and delivery-focused.

Requirements

  • 5+ years of experience in a production environment with a solid software development background and understanding of performance tuning, end-to-end troubleshooting, networking fundamentals and appropriate attention to detail
  • Ability to focus, provide resolutions for production issues in a high demanding and pressured environment
  • 5+ years’ hands-on experience in designing, developing, and implementing technical solutions, or significant experience in deep technical support
  • Strong experience in scripting language (Shell scripting, Python, Perl, etc.) and cloud driven development
  • Strong database skills with DB2, Sybase or Oracle
  • Hands-on experience with Autosys or other batch scheduling software
  • Strong experience in Continuous Integration and Continuous Deployment
  • Strong experience in environment on demand for both Virtual Machines and containers
  • Knowledge and hands-on experience with monitoring tools like Splunk, IP Soft, Sockeye
  • Practical experience in Agile Methodology (e.g. Scrum)
  • Knowledge or experience with automating deployments using Jenkins and Train
  • Ability to diagnose technical problems, debug, optimize code, and automate routine tasks
  • Hands-on experience in application and database troubleshooting/issue resolution in a fast-paced environment
  • Excellent communication and ability to think out of the box for process improvements.
  • Knowledge of Cloud based deployment, security, networking concepts in Azure and AWS
  • Hands on experience leveraging generative AI tools to enhance research, automate and improve productivity
  • Minimum BS degree in Computer Science, Engineering or a related field

Nice To Haves

  • Knowledge or experience with algorithms, data structures, complexity analysis and software design
  • Interest in designing, analyzing and troubleshooting large-scale distributed systems

Responsibilities

  • Maintain applications once they are live by measuring and monitoring availability, latency and overall system health with a focus on business activities and continuously evaluate cost and TOIL.
  • Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, capacity planning and launch reviews.
  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity; includes automation for other various operational needs.
  • Troubleshoot infrastructure issues, reviewing log files, updating documentation, and having knowledge base with resolutions
  • Work closely with the application Development team to understand the platform and create tools/utilities to help with production management
  • Work with upstream data providers and upstream consumers, and reducing the amount of escalation to development teams
  • Develop scripts and assist with code changes along with operational tasks/activities.
  • Work closely with Application Development to ensure that the support team has excellent knowledge of the application set, own and maintain support knowledgebase and documents.
  • Use analytical skills to find trends in the environment and drive out problems.
  • Lead effort to determine improvement areas to stabilize the plant.
  • Identify risks and work with a sense of urgency, working within a team or independently.
  • Test and tune network, hardware, and software configurations to maximize performance
  • Interface with different teams like IT Dev managers, Infrastructure teams and lead as a Subject Matter Expert (SME) for the application(s) supported.
  • Understand the overall business flow of supported application systems and its interface with clients
  • Take ownership and managing production requests, questions, issues and perform Root Cause Analysis for outages/incidents
  • Understand the overall business flow of supported application systems and its interface with clients
  • Be flexible to provide weekend on call rotation and available for offshore time lead
  • Be accountable for the Production Environments as well as the non-Production Environments and be part of 24/7 production support coverage.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service