-
it infrastructure 基础架构主管 / 网络工程师
[上海·徐汇区] 2023-04-0332k-38k·16薪 经验10年以上 / 本科专业服务|咨询 / 未融资 / 少于15人外资,世界500强,英语流利,Global岗位,管理竞业 Maintaining, developing, implementing and supporting the technical aspects of core IT Infrastructure in each region Ensuring the operational stability of infrastructure solutions; delivering a robust, resilient and cost-effective IT Infrastructure that meets the evolving needs of the organization whilst also providing 2nd and 3rd line technical support. 1.Infrastructure Strategy and Core Service Delivery 2.End User Support, Problem, Event and Incident Management 3.External supplier management 新办公室搭建IT基础,办公室搬迁, 擅长网络,服务器,存储 -
50k-68k 经验10年以上 / 本科金融 / 上市公司 / 150-500人Job Summary This is a greenfield development opportunity to design, build and implement the next generation derivatives platform. We are committed to modernize our infrastructure and aligning our operating model to a client-led framework. This is our key strategic project to bring innovation to market participants in trading, clearing and risk management of the derivatives markets. We are looking for a dynamic and passionate individual to join us as the System Development Lead in the Risk Management System. The incumbent will primarily focus as the Risk Management System architect to drive the overall application design, and will have the opportunity to develop and work with new technology. He/she will be part of the mission critical platform development team and will have the privilege to work closely with seasoned, enthusiastic technologists across the IT Division as well as the exposure to work with various business stakeholders across HKEX. Job Duties Job Duties Responsibilities: Manage system development life cycle and oversee project implementation including the requirement definition, resource estimation, system design and development, functional and technical testing, production migration and documentation. Head the internal development team for systems development and testing. Design and implement systems with modern technologies, high availability and scalability; investigate and resolve design defects and technical issues. Perform quality review for software design and code changes. Collaborate with teams across IT and business departments. Ensure system development and operation in accordance with the IT development standards and guidelines, and corporate policies and frameworks. Requirements: • Degree in Computer Science, Information Technology, Mathematics, Statistics, Financial Engineering or related disciplines. • 10+ years system development experience in large scale systems, preferably in the financial services industry. • Strong skillsets in Java, C++, Python, Perl, RESTFul, SQL, React JS, JSX. • Expertise experience in system architecture design. • Extensive experience in development in Spring Boot, Linux, data modelling, micro-services and containerized architecture platform. • Strong knowledge and experience in cloud technology with solid experiences of in-memory DB and messaging technologies are essential. • Experience in DevOps framework and tools (eg. Jira, Git, Jenkins, etc) and automated testing frameworks like Selenium, Jasmine or JUnit. • Familiarity with object storage, solid proficiency in database, noSQL and data manipulation tools for handling and analyzing large datasets. • Strong analytical and problem solving skills and mindset, self-motivated and able to work independently. • Understanding of quantitative finance concepts, derivatives pricing, risk management model, and/or portfolio optimization is a definite advantage. • Good communication skills to work and interact with internal IT teams and business stakeholders.
-
19k-30k 经验3-5年 / 本科金融 / 上市公司 / 150-500人Responsibilities: 1) Partner with the application teams on IT solution configuration and review. 2) Interact with internal and external parties on planning and execution. 3) Play a role in incident and problem management such as impact assessments, troubleshooting and remediation and prevention. 4) Compile operation and technical documents such as knowledge base, technical procedures and test reports. 5) Automation of server operations. 6) Collaborate with the Cyber security/infrastructure engineering and project teams to adopt modern technologies, such as multi-cloud and container orchestration, to enhance manageability. 7) Take roster on call for non-office hour to support production incidents and alerts, all the OT will take compensatory leave to offset: a. Mon – Fri non-office hour: once per month, around 10 calls, average 1 hour per call b. Saturday whole day: once per two months, around 8 hours Requirements: 1) Degree in Computer Science, Information System or IT related disciplines. 2) Minimum 3 years of direct, hands-on Windows administration experience. 3) Advanced knowledge of Windows platform. 4) Familiarity with VMware ESXi support. 5) Familiarity with Powershell scripting. 6) Experience of managing change implementation on large scale systems. 7) Familiarity with Automation practices and tools (SCCM) 8) Good team player with passion to learn and aspirations for career development in the financial industry.
-
15k-25k·14薪 经验3-5年 / 本科生活服务,旅游|出行 / 上市公司 / 150-500人DragonPass is a global B2B2C Airport services provider developing fintech travel solutions for companies such as Barclays, Visa, MasterCard, RBS, Revolut and many similar companies all around the world. Services range including Transport Security Fast Track, Airport lounges and Airport restaurants. Our main aim is to simplify technical complexity. We essentially help our end customers use a multitude of services across an array of complexity in as easier a manner as possible. All of this must be done with beautiful UI and UX. DragonPass has headquarters in Guangzhou, with the international headquarters outside of China being in Hale, Cheshire. There are additional offices in London, Sao Paulo, Johannesburg, Singapore, Shenzhen, Beijing and Shanghai. The business is growing very quickly and looking to recruit individuals with a passion for travel, networking and self-development. What will you be doing? The 24/7 technical support engineer will be acting as a bridge between development team and clients/account management team. Bilingual is mandatory to communicate with the development team and clients/account management team. The 24/7 technical engineer shall handle the call/email from clients for any incidents or 7system outage. The 24/7 technical engineer shall provide an incident report for any issue and provide incident status update for the stakeholders. Check error log from the monitoring system and, identify the root cause of the issue. Create and manage the incident ticket in JIRA. Identify the areas/parts of the system/service causing the issue, and the corresponding impacts to the customer. Create the standard operation support documents in English. Implement the urgent fix based on the standard operation support documents with the support from the DevOps at infrastructure level. Filter the non-critical issues that can be fixed in next day. Preferred skills One or more years of experience in 24/7 Technical support Excellent hands-on support experience in AWS/ Azure Knowledge of programming languages such as Java Strong experience in Linux and Terraform Proficiency in writing automated scripting language in Python Experience in monitoring tools – Grafana, or any other monitoring tools Understanding of full-stack web/mobile, including protocols and web server optimization standards Broad understanding of Oracle, MySQL, NoSQL Database experience, such as MongoDB Proficient in English and Mandarin language skills
-
职位职责: 团队介绍:Data AML是字节跳动公司的机器学习中台,为抖音/今日头条/西瓜视频等业务提供推荐/广告/CV/语音/NLP的训练和推理系统。为公司内业务部门提供强大的机器学习算力,并在这些业务的问题上研究一些具有通用性和创新性的算法。同时,也通过火山引擎将一些机器学习/推荐系统的核心能力提供给外部企业客户。此外,AML还在AI for Science,科学计算等领域做一些前沿研究。 课题介绍:大规模推荐系统正在越来越多的应用到短视频、文本社区、图像等产品上,模态信息在推荐系统中的作用也越来越大。 字节实践中发现模态信息能够很好的作为泛化特征支持推荐等业务场景,端到端的超大规模多模态推荐系统的研究具有非常大的想象空间。 期望在算法和工程CoDesign基础上,对多模态Cotrain、7B/13B大规模参数模型、更长序列端到端等方向进一步进行探索。 工程上研究方向包括多模态样本的表征、基于 pytorch 框架的高性能多模态推理引擎、高性能多模态训练框架的构建、异构硬件在多模态推荐系统上的应用;算法上的研究方向包括设计合理的推荐广告和多模态Cotrain结构、Sparse MOE、Memory Network、混合精度等。 1、负责机器学习系统架构的设计开发,以及系统性能调优; 2、负责解决系统高并发、高可靠性、高可扩展性等技术难关; 3、覆盖机器学习系统多个子方向领域的工作,包括:资源调度、任务编排、模型训练、模型推理、模型管理、数据集管理、工作流编排、ML for System等; 4、负责机器学习系统前瞻技术的调研和引入,比如:最新硬件架构、异构计算系统、GPU优化技术的引入落地; 5、研究基于机器学习方法,实现对集群/服务资源使用情况的分析和优化。 职位要求: 1、2026届及之后毕业,博士在读,计算机、软件工程等相关专业优先; 2、熟练掌握Linux环境下的C/C++/Go/Python/Java等1至2种以上语言; 3、掌握分布式系统原理,参与过大规模分布式系统的设计、开发和维护; 4、有优秀的逻辑分析能力,能够对业务逻辑进行合理的抽象和拆分,良好的团队合作精神; 5、有强烈的工作责任心,较好的学习能力、沟通能力和自驱力; 6、有良好的工作文档习惯,及时按要求撰写更新工作流程及技术文档。 加分项: 1、熟悉Kubernetes架构,有丰富的云原生系统开发经验; 2、熟悉至少一种主流的机器学习框架(TensorFlow/PyTorch/MXNet); 3、熟悉Django、Flask等相关技术,有其后端开发经验; 4、有以下某一方向领域的经验:AI Infrastructure,HW/SW Co-Design,High Performance Computing,ML Hardware Architecture(GPU,Accelerators,Networking),Machine Learning Frameworks,ML for System,Distributed Storage; 5、有大规模云计算平台或私有云产品架构开发经验。
-
职位职责: 团队介绍:Data AML是字节跳动公司的机器学习中台,为抖音/今日头条/西瓜视频等业务提供推荐/广告/CV/语音/NLP的训练和推理系统。为公司内业务部门提供强大的机器学习算力,并在这些业务的问题上研究一些具有通用性和创新性的算法。同时,也通过火山引擎将一些机器学习/推荐系统的核心能力提供给外部企业客户。此外,AML还在AI for Science,科学计算等领域做一些前沿研究。 课题介绍:大规模推荐系统正在越来越多的应用到短视频、文本社区、图像等产品上,模态信息在推荐系统中的作用也越来越大。 字节实践中发现模态信息能够很好的作为泛化特征支持推荐等业务场景,端到端的超大规模多模态推荐系统的研究具有非常大的想象空间。 期望在算法和工程CoDesign基础上,对多模态Cotrain、7B/13B大规模参数模型、更长序列端到端等方向进一步进行探索。 工程上研究方向包括多模态样本的表征、基于 pytorch 框架的高性能多模态推理引擎、高性能多模态训练框架的构建、异构硬件在多模态推荐系统上的应用;算法上的研究方向包括设计合理的推荐广告和多模态Cotrain结构、Sparse MOE、Memory Network、混合精度等。 1、负责机器学习系统架构的设计开发,以及系统性能调优; 2、负责解决系统高并发、高可靠性、高可扩展性等技术难关; 3、覆盖机器学习系统多个子方向领域的工作,包括:资源调度、任务编排、模型训练、模型推理、模型管理、数据集管理、工作流编排、ML for System等; 4、负责机器学习系统前瞻技术的调研和引入,比如:最新硬件架构、异构计算系统、GPU优化技术的引入落地; 5、研究基于机器学习方法,实现对集群/服务资源使用情况的分析和优化。 职位要求: 1、2026届及之后毕业,博士在读,计算机、软件工程等相关专业优先; 2、熟练掌握Linux环境下的C/C++/Go/Python/Java等1至2种以上语言; 3、掌握分布式系统原理,参与过大规模分布式系统的设计、开发和维护; 4、有优秀的逻辑分析能力,能够对业务逻辑进行合理的抽象和拆分,良好的团队合作精神; 5、有强烈的工作责任心,较好的学习能力、沟通能力和自驱力; 6、有良好的工作文档习惯,及时按要求撰写更新工作流程及技术文档。 加分项: 1、熟悉Kubernetes架构,有丰富的云原生系统开发经验; 2、熟悉至少一种主流的机器学习框架(TensorFlow/PyTorch/MXNet); 3、熟悉Django、Flask等相关技术,有其后端开发经验; 4、有以下某一方向领域的经验:AI Infrastructure,HW/SW Co-Design,High Performance Computing,ML Hardware Architecture(GPU,Accelerators,Networking),Machine Learning Frameworks,ML for System,Distributed Storage; 5、有大规模云计算平台或私有云产品架构开发经验。
-
Location: China (Remote, with 50% travel to Southeast Asia; candidates from tier-1 or tier-2 cities encouraged to apply) Role Overview: As a Building Energy Management and IoT Presales Engineer, you will provide technical expertise during the sales process, designing and presenting customized IoT and energy management solutions tailored to customer needs in Southeast Asia. With at least 8 years of experience in the built environment and IoT sensor implementation, you will ensure solutions align with client infrastructure and train resellers to independently conduct site work. Exceptional communication skills are critical to effectively engage with resellers and clients, requiring frequent travel (50% of the time) for site visits across the region. Key Responsibilities: • Collaborate with the Business Development team to design and present customized IoT and energy management solutions, showcasing the value of Neuron’s offerings to clients in Southeast Asia. • Conduct site visits (50% travel) across Southeast Asian markets (e.g., Thailand, Malaysia, Indonesia) to assess building environments and recommend optimal IoT sensor placements for applications such as energy efficiency, environmental monitoring, or smart building systems. • Train and upskill resellers to independently perform site assessments and IoT sensor deployments, ensuring they are equipped with technical knowledge and best practices. • Apply deep knowledge of building energy management principles (e.g., HVAC, energy optimization, structural energy design) to align solutions with client infrastructure. • Work closely with internal tech and product teams to clarify product boundaries, technical specifications, and integration requirements. • Lead client onboarding processes, coordinating with relevant teams to provide training and resources for seamless product adoption. • Deliver sales enablement and training to equip the sales team and resellers with technical knowledge to pitch solutions effectively in Southeast Asian markets. • Address customer and reseller technical queries during the sales process, leveraging strong communication skills to build confidence in Neuron’s offerings and increase conversion rates. Qualifications: • Bachelor’s degree in Building Engineering, Electrical Engineering, Energy Management, or a related field. • Minimum of 8 years of experience in presales engineering or a related role within the built environment, with strong expertise in building energy management and IoT sensor implementation. • Proven experience conducting site visits and assessing building infrastructure for IoT and energy management deployments, ideally in Southeast Asian markets. • Extensive knowledge of IoT sensor technologies, including optimal placement strategies for smart building and energy efficiency applications. • Exceptional communication skills to train resellers, engage diverse Southeast Asian clients, and translate complex technical concepts into clear, customer-friendly solutions. • Ability to work collaboratively with sales, product, technical teams, and resellers across regions. • Willingness to travel 50% of the time for site visits across Southeast Asia (e.g., Thailand, Malaysia, Indonesia). • Proficiency in Mandarin and English is preferred to engage with resellers, clients, and teams in China and Southeast Asia. • Experience in a tech or SaaS startup environment is a plus. • In-depth, hands-on knowledge of industrial communication protocols, including: o RS485 o Modbus (RTU/TCP) o BACnet o LoRaWAN • Familiarity with other industrial protocols (e.g., Profibus, Profinet, CAN bus) is a plus.
-
Job description: Provide Level 1 support to end users to resolve incidents and diagnose underlying problems using remote connection toolset and implement corrective actions. Handles calls within product/client phone queues as directed. Performs required troubleshooting on all calls, and escalates, as necessary, any calls outside agent’s established technical knowledge boundaries. Escalates complex problems to the Remote Support Engineering staff or Field Engineering. Ensures customer satisfaction on all completed calls, or verifies that customer has alternative plan for problem resolution. Effectively communicate with members of management and technology support teams. Informs supervisors of any work conflicts, dissatisfied customers, or hardware/software malfunctions. To work in a shift pattern and be flexible. Key qualifications: Good command of oral Cantonese, capable to read and write English Have good IT knowledge & background (OS, Application software, Networks & IT infrastructure) Good in trouble shooting application problems Good communication skills and customer services oriented
-
Description The platform team is seeking an experienced Site Reliability Engineer (SRE) to meet rapid expansion of our business. You need to be highly sensitive to system reliability, and keen on identifying/resolving system risks to keep the system working well. In the platform team, you will be involved in provisioning, maintaining infrastructure, proposing solutions for the system, and working online with people from different countries. Responsibilities: • Participate in on-call duty to respond/investigate/resolve system incidents or handle support tickets for application teams. • Pay attention to alarms in the monitoring system, provide timely feedback, and solve problems. • Design, implement, and govern infrastructure to achieve high availability & scalability. • Evaluate and research technical initiatives with complete plans including documentation, provisioning, testing, and monitoring. • Construct service quality system, lead the team to complete indicator quantification. Required Skills and Qualifications: • Good English communication and writing skills, learning ability, and hands on skills. • Proficiency with Azure (Azure resources, network models, and best practices). • More than 2 years of experience in managing AKS/Kubernetes. • Familiar with Infrastructure as Code, Terraform preferred. • Familiar with CI/CD automation. • Familiar with observability technologies, like Prometheus, and Grafana. • Familiar with several of following middleware: Kafka, MySQL, Mongo, Elasticsearch, and Redis. Nice to Have: • CKA, CKAD Certificate is a plus. • Certificates related to Cloud Native/ Ops and Maintenance Qualifications is a plus. • Familiar with Java or Go.
-
Description The platform team is seeking an experienced Site Reliability Engineer (SRE) to meet rapid expansion of our business. You need to be highly sensitive to system reliability, and keen on identifying/resolving system risks to keep the system working well. In the platform team, you will be involved in provisioning, maintaining infrastructure, proposing solutions for the system, and working online with people from different countries. Responsibilities: • Participate in on-call duty to respond/investigate/resolve system incidents or handle support tickets for application teams. • Pay attention to alarms in the monitoring system, provide timely feedback, and solve problems. • Design, implement, and govern infrastructure to achieve high availability & scalability. • Evaluate and research technical initiatives with complete plans including documentation, provisioning, testing, and monitoring. • Construct service quality system, lead the team to complete indicator quantification. Required Skills and Qualifications: • Good English communication and writing skills, learning ability, and hands on skills. • Proficiency with Azure (Azure resources, network models, and best practices). • More than 2 years of experience in managing AKS/Kubernetes. • Familiar with Infrastructure as Code, Terraform preferred. • Familiar with CI/CD automation. • Familiar with observability technologies, like Prometheus, and Grafana. • Familiar with several of following middleware: Kafka, MySQL, Mongo, Elasticsearch, and Redis. Nice to Have: • CKA, CKAD Certificate is a plus. • Certificates related to Cloud Native/ Ops and Maintenance Qualifications is a plus. • Familiar with Java or Go.
-
DataVisor is the world’s leading AI-powered Fraud and Risk Platform that delivers the best overall detection coverage in industry. With an open SaaS platform that supports easy consolidation and enrichment of any data, DataVisor's solution scales infinitely and enables organizations to act on fast-evolving fraud and money laundering activities in real time. Its patented unsupervised machine learning technology, advanced device intelligence, powerful decision engine and investigation tools work together to provide guaranteed performance lift from day one. DataVisor's platform is architected to support multiple use cases across different business units flexibly, dramatically lowering total cost of ownership, compared to legacy point solutions. DataVisor is recognized as an industry leader and has been adopted by many Fortune 500 companies across the globe. Our award-winning software platform is powered by a team of world-class experts in big data, machine learning, security, and scalable infrastructure. Our culture is open, positive, collaborative, and results driven. Come join us! The software engineer - infrastructure team is the backbone of DataVisor. Without our distributed and highly robust systems, business would stop. We tackle important challenges: clients require sub-second response times while we find relationships in terabytes of data. We’ve created, and continually improve, our massive cluster infrastructure, allowing highly computationally expensive jobs to run smoothly. We love using and learning a 2+ years experience with infrastructure software engineering or production system operation experiences 2+ years experience with Amazon Web Services or other cloud service providers Strong understanding of the Linux environment and systems configurations Experience with Java and Python Experience with Kubernetes, docker Has basic knowledge about networking Open to work remotely Experience with Apache Spark, Cassandra, Flink Experience with Grafana, Loki, ELK
-
About the Role Epsilla is a leading RAGaaS platform for Generative AI, offering high-performance vector databases. As a Backend Engineer at Epsilla, you will build and enhance Epsilla’s product lines, improve workflow efficiency and scalability, integrate new AI developments into our platform, intergating Large Language Models(LLM) and manage our infrastructure. This dynamic role places you at the forefront of distributed systems infrastructure and Generative AI. About the Interview: 1. 15-minute behavioral interview, sharing your understanding of our platform(https://cloud.epsilla.com) 2. Take-home assessment 3. 45-minute technical discussion Responsibilities: - Writing well-designed, testable and efficient code - Collaborate with founders to conceptualize and implement sub-products and features from start to finish to deliver winning products. - Participate in the design of technical architecture, ensuring scalability and maintainability. - Troubleshooting and debugging to drive performance optimization. - Engage with customers and evaluating user feedback to understand core pain points and iterate on feature requests. - Contribute to customer support, particularly for the products you develop. - Stay current with the latest AI advancements by investigating alternatives to integrate new capabilities into our platform. Requirements: - Degree/Diploma in Computer Science, Engineering or related field. - Proficiency in C++, TypeScript, and Python/FastAPI. - Understanding of performance optimization best practices. - Strong attention to detail and deliver work that is of a high standard. - Strong communication skills and the ability to work in fast-paced environments. - Excellent problem-solving skills with a proven ability to overcome challenges. Nice to Have: - Experience in AI, Machine Learning, and building RAG workflows. - Experience with no/low-code platforms. - Experience in early-stage startup environments.
-
24k-32k·13薪 经验5-10年 / 本科硬件 / 天使轮 / 少于15人Note: This position is being recruited by Intellisn on behalf of bitHuman, and the role is unrelated to Intellisn’s business operations. 注意: 此职位由Intellisn代表bitHuman招聘,与Intellisn的业务运营无关。 About Us: bitHuman is a US-based AI company founded in 2023 by seasoned tech entrepreneur Steve Gu. With a rich history in successful tech startups and a deep technical background, Steve has positioned bitHuman at the forefront of developing lifelike AI agents for enterprise use. Under Steve's leadership, alongside a top-tier team of technologists and product developers, bitHuman is dedicated to revolutionizing the way businesses interact with digital environments. Our mission is to enhance human-machine interactions through advanced, human-centric AI technology, driving innovation and optimization in business processes. Role Summary: As a Sr. Software Development Engineer, you will join an innovative project team led and envisioned by Steve Gu himself. You will be responsible for building and maintaining the backend infrastructure that supports our AI-driven platforms, developing scalable and stable software solutions using technologies like Python. In this role, you will have the opportunity to directly influence the company's technical direction and product development, working alongside top talents in the industry to create impactful AI applications. Key Responsibilities: - Develop and maintain efficient, reusable, and reliable Python code. - Implement server-side logic to ensure high performance and responsiveness to front-end requests. - Collaborate with the DevOps team on application deployment and management using Docker and Kubernetes. - Work closely with cross-functional teams to translate business requirements into technical specifications. - Mentor junior developers and promote coding best practices. Requirements: - 5+ years of experience in backend software engineering or a similar role. - Strong proficiency in Python, with a solid understanding of its ecosystems. - Familiarity with Docker, Kubernetes, and other containerization and orchestration tools. - Experience with RESTful APIs and server-side logic. - Ability to read and understand technical documentation in English, with basic oral communication skills in English. - Self-motivated and capable of working independently with minimal supervision. - Strong problem-solving skills and the ability to work in a fast-paced, dynamic environment. Why Join bitHuman? - 100% Remote Work: Enjoy the flexibility of working from anywhere, setting your own schedule to optimize productivity. - Innovative Projects: Engage with cutting-edge technology that is shaping the future of AI in business. - Career Growth: Benefit from a commitment to professional development and opportunities for advancement. Interested candidates are encouraged to apply by submitting a resume and a cover letter detailing their qualifications and interest in the role. Join us at bitHuman, where together we are building the future of AI-enhanced business interactions.
-
25k-40k·14薪 经验5-10年 / 不限营销服务|咨询,数据服务|咨询 / 上市公司 / 500-2000人岗位职责: 配合产品开发团队,保障FreeWheel数据平台的关键服务和重要基础设施的稳定、可靠地运行; 工作内容: 1. 深入理解业务,持续提升业务SLO/SLA; 2. 通过持续的全方位数据运营(包括可用性指标.历史事故.资源利用率等),找到系统容量、可用性、稳定性方面的薄弱点,并推进落地改进项目; 3. 参与建设运维工具.平台,推进运维自动化,量化数据,使用代码解决线上问题; 4. 参与故障应急响应处理,持续打磨监控系统,提升报警准确率,缩短故障定位时长; 5. 积累运维最佳实践,为业务及基础设施架构设计与资源选型提供指导,输出标准运维流程文档; 岗位要求: 1. 5年及以上相关工作经验,计算机科学或相关专业(通信、电子、信息、自动化等)优先; 2. 熟悉主流云厂商及服务,如AWS/GCP/Azure/AliCloud等; 3. 云环境管理与优化经验,包括成本管理,安全管理,运维管理,应用架构优化; 4. 熟悉业内流行的大数据或消息队列等分布式系统平台:Aerospike, Kafka, Hadoop,Yarn,HDFS,Hbase,Druid或其他NoSQL系统等; 5. 积极拥抱 “Infrastructure as Code”思想并有较丰富的实践经验,熟悉相关厂商及开源解决方案,如CloudFormation/Terraform等; 6. 运维平台设计与使用经验,如设计或参与开发过运维管理平台:资源管理,K8s管理,配置管理等; 7. 对多种云计算基础服务有较丰富的实践操作经验,包括但不限于: VPC, Subnets, Security Group, EC2, S3, IAM, Route 53, Security Hub etc; 8. 深入理解Linux操作系统,并掌握多种开源解决方案及相应技能:Kubernetes/Container/Nginx/Ansible/Prometheus/Grafana/ELK; 9. 熟悉Golang开发语言为优; 10. 工作积极主动,有强烈的责任心,执行能力强;善于思考总结,有很强的学习、问题分析和推进解决能力; 11. 基本的英文听说能力,较强的读写能力,能够快速融入英文工作环境。
-
25k-40k·14薪 经验5-10年 / 不限营销服务|咨询,数据服务|咨询 / 上市公司 / 500-2000人岗位职责: 配合产品开发团队,保障FreeWheel数据平台的关键服务和重要基础设施的稳定、可靠地运行。 工作内容: 1. 深入理解业务,持续提升业务SLO/SLA; 2. 通过持续的全方位数据运营(包括可用性指标/历史事故/资源利用率等),找到系统容量、可用性、稳定性方面的薄弱点,并推进落地改进项目; 3. 参与建设运维工具.平台,推进运维自动化,量化数据,使用代码解决线上问题; 4. 参与故障应急响应处理,持续打磨监控系统,提升报警准确率,缩短故障定位时长; 5. 积累运维最佳实践,为业务及基础设施架构设计与资源选型提供指导,输出标准运维流程文档; 岗位要求: 1. 5年及以上相关工作经验,计算机科学或相关专业(通信/电子/信息/自动化等)优先; 2. 熟悉主流云厂商及服务,如AWS/GCP/Azure/AliCloud等; 3. 云环境管理与优化经验,包括成本管理,安全管理,运维管理,应用架构优化; 4. 熟悉业内流行的大数据或消息队列等分布式系统平台:Aerospike, Kafka, Hadoop,Yarn,HDFS,Hbase,Druid或其他NoSQL系统等; 5. 积极拥抱 “Infrastructure as Code”思想并有较丰富的实践经验,熟悉相关厂商及开源解决方案,如CloudFormation/Terraform等; 6. 运维平台设计与使用经验,如设计或参与开发过运维管理平台:资源管理,K8s管理,配置管理等; 7. 对多种云计算基础服务有较丰富的实践操作经验,包括但不限于: VPC, Subnets, Security Group, EC2, S3, IAM, Route 53, Security Hub etc; 8. 深入理解Linux操作系统,并掌握多种开源解决方案及相应技能:Kubernetes/Container/Nginx/Ansible/Prometheus/Grafana/ELK; 9. 熟悉Golang开发语言为优; 10. 工作积极主动,有强烈的责任心,执行能力强;善于思考总结,有很强的学习、问题分析和推进解决能力; 11. 基本的英文听说能力,较强的读写能力,能够快速融入英文工作环境;


