Comcast Engineer 3, Web and Mobile App Support in Philadelphia, Pennsylvania
Comcast brings together the best in media and technology. We drive innovation to create the world's best entertainment and online experiences. As a Fortune 50 leader, we set the pace in a variety of innovative and fascinating businesses and create career opportunities across a wide range of locations and disciplines. We are at the forefront of change and move at an amazing pace, thanks to our remarkable people, who bring cutting-edge products and services to life for millions of customers every day. If you share in our passion for teamwork, our vision to revolutionize industries and our goal to lead the future in media and technology, we want you to fast-forward your career at Comcast.
The resource is a member of the Residential Reliability Engineering Support Team responsible for developing and maintaining standard operating procedures (SOP's) specific to our Xfinity Home product. The Incident Manager will ensure that all incidents are identified, triaged and resolved within the Service Level Agreement. Additionally, this position will be responsible for ensuring that all root cause analysis is promptly and properly documented for high severity incidents and delivered to the respective Product owners. This position will interface with Comcast Product, Change, Problem, Release, Engineering, Marketing and Operations Management teams.
- Lead technical investigation and triage of production issues; analyze logs, perform end-to-end investigation including but not limited to network, software and infrastructure issues.
- Document training and triage procedures (including enhancing exiting training and triage procedures) and complex application workflows (including API's and endpoints.)
- Draft Residential Engineering production support readiness documentation.
- Actively manage relationship with key stakeholders, markets and resolver groups.
- Respond to service-level issues and work to restore normal service operations as quickly as possible
- Assist in training and developing junior Engineers
- Identify and lead the implementation of creative process and technology solutions within the team
- Provide mentorship and team development opportunities
- Assist in representing Production Support to the organization ensuring that high-availability and the ability to identify customer-facing issues is included in the development or deployment of new products and services.
- Identify and recommend opportunities for "clean-slate" process improvement with regards to incident management, fault monitoring, triage procedures and issue escalation
- Develop procedures for incident triage and management, metric and measure creation, management and administration of monitoring tools
- Oversee the timely execution of scheduled and repeatable processes such as periodic system validations, daily triage, and system monitoring and event log management
- Work with architecture, development and engineering teams to identify root cause for recurring incidents and create an action plan for resolution.
- Monitor systems and services for most efficient operation, identifying fault conditions as well as opportunities for further optimization
- Maintain escalation and contact lists for mission critical systems and services
- Consistent exercise of independent judgment and discretion in matters of significance
- Regular, consistent and punctual attendance. Must be able to work nights and weekends, variable schedules(s) as necessary
Bachelor's degree in Networking Engineering, Business or equivalent work experience is required.
Strong understanding of ITIL and Incident Management practices.
Generally requires 5 to 7 years of experience
- 5 years' experience in an Enterprise 24x7 Network Operations Center or Production Support environment.
- Minimum 3 years' Customer Service experience, Incident and Problem Management required.
- Minimum 3 years' experience defining, implementing, and monitoring IT service level processes.
- Technical expertise in network and server administration with hands on experience.
- Experience working in a large (1000 server) and complex operations environments.
- An understanding of Cloud infrastructure (Network and Server architecture).
- Experience with monitoring technologies such as OIV, Splunk, Op5 and the Haystack tools is a plus
Comcast is an EOE/Veterans/Disabled/LGBT employer