-
it infrastructure 基础架构主管 / 网络工程师
[上海·徐汇区] 2023-04-0332k-38k·16薪 经验10年以上 / 本科专业服务|咨询 / 未融资 / 少于15人外资,世界500强,英语流利,Global岗位,管理竞业 Maintaining, developing, implementing and supporting the technical aspects of core IT Infrastructure in each region Ensuring the operational stability of infrastructure solutions; delivering a robust, resilient and cost-effective IT Infrastructure that meets the evolving needs of the organization whilst also providing 2nd and 3rd line technical support. 1.Infrastructure Strategy and Core Service Delivery 2.End User Support, Problem, Event and Incident Management 3.External supplier management 新办公室搭建IT基础,办公室搬迁, 擅长网络,服务器,存储 -
职位职责: 1、负责机器学习系统推理架构和产品的设计开发,支持火山方舟大模型平台和机器学习平台的产品业务; 2、负责深度模型推理任务为核心的在线架构设计与优化,充分利用各种异构计算(GPU、CPU、其他异构硬件)、存储(各种云存储)、网络(VPC、RDMA)等资源,构建多租环境下的稳定性、观测体系,实现高并发、高吞吐的大规模在线系统; 3、负责推理系统的产品化落地,打造稳定、可观测、体验一流的公有云推理平台。 职位要求: 1、熟练掌握Linux环境下的Go/Java/Python等1-2种语言; 2、具备扎实的计算机科学功底和编程能力,熟悉常见算法和数据结构,具有良好的编程习惯; 3、熟悉至少一种主流的机器学习框架(TensorFlow / PyTorch 或其他自研框架); 4、熟悉 Kubernetes 架构和生态,有丰富的云原生机器学习系统实践和开发经验,对在线服务治理、 部署架构有深入理解和落地经验; 5、掌握分布式系统原理,参与过大规模分布式系统的设计、开发和维护; 6、有优秀的逻辑分析能力,能够对业务逻辑进行合理的抽象和拆分; 7、有强烈的工作责任心,较好的学习能力、沟通能力和自驱力,能够快速的响应和行动; 8、有良好的工作文档习惯,及时按要求撰写更新工作流程及技术文档。 加分项: 1、有在线GPU推理系统的工程架构落地经验,熟悉常见的在线推理优化手段(Batch、量化、分布式推理等),熟悉GPU、大模型相关软硬件技术栈; 2、熟悉公有云推理产品架构,对该领域用户画像和用户故事有深入理解,有打造***产品的热情; 3、有以下某一方向领域的经验:CUDA,RDMA,AI Infrastructure,HW/SW Co-Design,High Performance Computing,ML Hardware Architecture (GPU, Accelerators, Networking),ML for System,Distributed Storage。
-
职位职责: 1、负责机器学习系统资源调度的设计和开发,支持火山方舟大模型平台和机器学习平台的产品业务; 2、负责多机房、多集群环境下的,各种异构计算(GPU、CPU、其他异构硬件)、存储(各种云存储)、网络(VPC、RDMA)等资源的最优化编排调度,在严格的多租隔离环境下,支持各种离线训练、在线推理等负载场景的调度需求,并实现整体资源的合理化、最大化利用。 职位要求: 1、熟练掌握Linux环境下的Go/Java/Python等1-2种语言; 2、具备扎实的计算机科学功底和编程能力,熟悉常见算法和数据结构,具有良好的编程习惯; 3、熟悉至少一种主流的机器学习框架(TensorFlow / PyTorch 或其他自研框架); 4、熟悉 Kubernetes 架构和生态,熟悉 Docker/Containerd/Kata 等容器技术,有丰富的云原生机器学习系统实践和开发经验; 5、掌握分布式系统原理,参与过大规模分布式系统的设计、开发和维护; 6、有优秀的逻辑分析能力,能够对业务逻辑进行合理的抽象和拆分; 7、有强烈的工作责任心,较好的学习能力、沟通能力和自驱力,能够快速的响应和行动; 8、有良好的工作文档习惯,及时按要求撰写更新工作流程及技术文档。 加分项: 1、有大规模集群在离线资源调度相关工作的实践经验,对K8S/Volcano/Yarn/Mesos等一到多个开源项目的调度实现有源码级的理解,熟悉容器化、轻量级虚拟机等相关技术; 2、熟悉常见调度算法,对多租户Quota治理、抢占、弹性、碎片、潮汐、混部、QoS等一到多个调度问题有深入理解和实践经验,具备较强的解决复杂问题的分析和建模能力,有GPU相关调度经验; 3、有以下某一方向领域的经验:CUDA,RDMA,AI Infrastructure,HW/SW Co-Design,High Performance Computing,ML Hardware Architecture (GPU, Accelerators, Networking),ML for System,Distributed Storage。
-
Responsibilities: 1. Responsible for research and development of the underlying infrastructure, requirements analysis, and technical planning of blockchain wallets. 2. Collaborate with business and product requirements, including but not limited to wallet transfers, NFT trading, DeFi, DEX, etc. to complete research and development of wallet GO SDK. 3. Participate in the architecture design, upgrading, transformation, and performance optimization of wallet supporting services. Collaborate with the leader to conduct technical research and overcome technical difficulties in systems and projects. 4. Conduct research on cutting-edge blockchain wallet technologies, such as secure multiparty computation (MPC), and emerging popular public blockchains and ecosystems. 5. Responsible for customization of blockchain full node clients to support business requirements; deployment, upgrading, and maintenance of full nodes. Requirements: 1. Bachelor's degree or above in computer science, software engineering, cryptography, or related fields. 2. Familiar with common data structures and algorithms. 3. Proficient in Golang programming(must be solid), proficient in mainstream Golang frameworks. 4. Interested in the underlying blockchain technologies and familiar with blockchain projects such as Bitcoin, Ethereum, Cosmos. 5. Hardworking, eager to learn, passionate about the crypto industry. Nice to haves: 1.Background in the blockchain industry and familiarity with cryptography is a plus. 2.Sufficient Blockchain knowledge, experienced in wallet product development. 3.Familiarity with Rust/C++/Java/JS is a plus. location: Shenzhen HongKong
-
·Creating, maintaining, and improving automation frameworks/infrastructure that test API/UI and Infrastructure ·Developing innovative tools to boost our testing efficiency, debug failures, and make it easy to communicate results with high-level reports ·Responsible for monitoring, analyzing, and reporting test automation results Requirements ·4+ years of experience in the software industry, with a passion for quality processes ·Ability to design/develop/maintain test automation frameworks ·Good experience in Python/Java, AWS, Rest API’s, CI/CD, Jenkins ·Strong problem-solving, critical thinking skills, verbal, and written communication skills ·Experience in testing and automating API/UI and Infrastructure ·Experience in software development lifecycle, test methodologies, and tools ·Bachelor’s in computer science/applications or similar field
-
Key Responsibilities Be the expert on our staking and infrastructure products, serving as the subject matter expert on product features, answering technical cryptocurrency-related questions, and demonstrating how InfStones can fit into clients' overall strategy. Serve as the primary business development contact for a group of InfStones highest value institutional clients, co-owning the relationship with our Institutional Coverage sales teams from the top of the funnel through on-platform support. Build and maintain strong relationships with new and existing InfStones institutional clients, driving successful integrations with the institutional product suite and impact on the overall business. Provide a best-in-class experience for our clients through effective stakeholder management, decisive prioritization, and efficient execution across institutional teams. Effectively triage, manage, and prioritize incoming client requests. Manage requests through to resolution, including delegating across teams of subject matter experts. Qualifications Overseas education or work experience, and being able to use English as working language. Bachelor in Economics / Finance/Management, or other Business related fields. Minimum of 2 years of relevant experience in an institutional client partner/services/advisory role in the financial services or technology industry. Excellent English communication skills in order to operate across multiple departments, stakeholders, and clients. Passion for the crypto/blockchain industry. Flexible and adaptable to meet the evolving needs of a high-growth and fast-paced organization. Experience with relationship or account management for institutional clients, liaising with executive leadership, management, and/or senior operations contacts. Preferred Qualifications (Nice to Have) Clear understanding of the blockchain infrastructure. Relevant experience in crypto or start-ups.
-
Job Description Solid knowledge regarding blockchain, consensus, node maintenance, etc. Result-driven. Able to provide a doable solution to clients regarding the given product/platform. Willing to communicate, enjoy solving issues brought up by both clients and other teammates. Be the expert on our staking and infrastructure products. Educate customers about blockchain infrastructure and how to utilize our platform. Collaborate across business, product, marketing, and engineering teams to drive the desired business outcomes for our customers. Own, manage, and report using CRM systems: ensure the system is up to date and that all relevant metrics are recorded. Qualifications B.S. in Computer Science, Software Engineering or other CS related fields. Experienced with command lines (Bash/Zash), cloud service, operation systems. 3+ years’ experience with inbound, outbound, and client-facing requests, positioning SaaS, or PaaS offerings. Knowledge of the crypto/blockchain industry is a must. Naturally curious and an eagerness to learn new technologies. Excellent listening, verbal and written communication skills. Basic data analytics skills and able to utilize relevant tools effectively to present insights. Self-starter attitude and the ability to research new ideas with autonomy. Preferred Qualifications (Nice to Have) Experienced with at least one programming language especially with Go, Rust or Web3 language like Solidity. Relevant experience in start-ups. Experience with Agile management tools, and CRM software including the configuration, and development of dashboards and reports.
-
Job Title: Technician/Senior Technician(IT Support) 技术员/高级技术员(IT支持) Department: Function Hub, HKUST(GZ) 功能枢纽 Job ID: Job Posting Details Formally established in June 2022, the Hong Kong University of Science and Technology (Guangzhou)(HKUST(GZ)) is a cooperatively-run university between the Chinese mainland and the Hong Kong Special Administrative Region. HKUST(GZ) has obtained approval from the Ministry of Education (MoE) and become the first legally-independent educational institution co-established by the Mainland and Hong Kong since the announcement and implementation of the “Outline Development Plan for the Guangdong-Hong Kong-Macao Greater Bay Area” and the “Overall Plan for Deepening Globally Oriented Comprehensive Co-operation amongst Guangdong, Hong Kong and Macao in Nansha of Guangzhou”. With a spirit of pioneering innovation, HKUST(GZ) charts new territories in cross-disciplinary education and explores new frontiers in pedagogies, aiming to serve as a role model of the mainland-Hong Kong integrated educational development and become a world-famous high-level university, endeavoring to nurture future-oriented, high-level and innovative talents. Function Hub is a cross-disciplinary research platform including all research fields of science and engineering as well as the related applications. Our Vision is to unlock the potential of basic elements in hard and natural sciences, and seek advanced and sustainable solutions to address real-world problems, thus benefitting mankind and the advancement of humanity. The Hub primarily comprises four thrust areas: Advanced Materials(AMAT), Earth, Ocean And Atmospheric Sciences(EOAS), Microelectronics(MICS), Sustainable Energy And Environment(SEE). We are currently seeking an experienced IT Engineer to join our dynamic team. The ideal candidate will have a strong background in IT infrastructure, network administration, and system maintenance. Duties 1. Installing, configuring, and maintaining hardware and software systems 2. Manage and maintain the hub’s IT infrastructure, including servers, networks, and databases 3. Provide technical support to faculty and staff members 4. Purchase and maintain IT equipment and other assets 5. Support the buildup and overall management of laboratories 6. Perform other duties and responsibilities as assigned by supervisor(s) Qualification Requirements 1. Bachelor’s degree in computer science or related field 2. 3 years of experience in IT engineering or related field is preferred 3. Good analytical and critical thinking skills, excellent problem-solving skills 4. Able to multi-task, detail-minded with good communication skills 5. Ability to work independently and as part of a team
-
Responsibility: · Design and build the infrastructure of SaaS cloud products, make the system massively scalable, highly available, and easily maintainable · Responsible for the architecture design of applications in life science industry · Collaborate with Product Management and Development team members on technical design and problem solving to come up with compelling solutions to problems · Participate in DevOps work, auto deploy the products to the cloud · Work in a passionate team environment within a highly successful company that is growing fast. Requirements: · 7 years+ of commercial software development in Java or Python · Rich architecture and platform building experience, specially SaaS multi-tenant architecture · Current hands-on development experience with open-source technologies: Spring, Spring Boot, Django, Celery, PyTest, MySQL, Git, Jenkins, Linux. · Experience with DevOps, auto deployment, continuous delivery · Familiar with VPC, EC2, S3, Auto Scaling, load Balancing, Container Service, ELK, SQS, CDN, Vagrant, Terraform, etc. · Good English communication skills - both oral and written · BS or above in computer science/engineering or equivalent Nice to Have: · Experience with AWS infrastructure, Ali Cloud, Salesforce platform development · Experience in frontend development, specially development on Wechat · Knowledgeable on CRM or life science domain Experience in DevOps development, proficient in Terraform, Ansible
-
15k-25k·14薪 经验3-5年 / 本科生活服务,旅游|出行 / 上市公司 / 150-500人DragonPass is a global B2B2C Airport services provider developing fintech travel solutions for companies such as Barclays, Visa, MasterCard, RBS, Revolut and many similar companies all around the world. Services range including Transport Security Fast Track, Airport lounges and Airport restaurants. Our main aim is to simplify technical complexity. We essentially help our end customers use a multitude of services across an array of complexity in as easier a manner as possible. All of this must be done with beautiful UI and UX. DragonPass has headquarters in Guangzhou, with the international headquarters outside of China being in Hale, Cheshire. There are additional offices in London, Sao Paulo, Johannesburg, Singapore, Shenzhen, Beijing and Shanghai. The business is growing very quickly and looking to recruit individuals with a passion for travel, networking and self-development. What will you be doing? The 24/7 technical support engineer will be acting as a bridge between development team and clients/account management team. Bilingual is mandatory to communicate with the development team and clients/account management team. The 24/7 technical engineer shall handle the call/email from clients for any incidents or 7system outage. The 24/7 technical engineer shall provide an incident report for any issue and provide incident status update for the stakeholders. Check error log from the monitoring system and, identify the root cause of the issue. Create and manage the incident ticket in JIRA. Identify the areas/parts of the system/service causing the issue, and the corresponding impacts to the customer. Create the standard operation support documents in English. Implement the urgent fix based on the standard operation support documents with the support from the DevOps at infrastructure level. Filter the non-critical issues that can be fixed in next day. Preferred skills One or more years of experience in 24/7 Technical support Excellent hands-on support experience in AWS/ Azure Knowledge of programming languages such as Java Strong experience in Linux and Terraform Proficiency in writing automated scripting language in Python Experience in monitoring tools – Grafana, or any other monitoring tools Understanding of full-stack web/mobile, including protocols and web server optimization standards Broad understanding of Oracle, MySQL, NoSQL Database experience, such as MongoDB Proficient in English and Mandarin language skills
-
Job description: Provide Level 1 support to end users to resolve incidents and diagnose underlying problems using remote connection toolset and implement corrective actions. Handles calls within product/client phone queues as directed. Performs required troubleshooting on all calls, and escalates, as necessary, any calls outside agent’s established technical knowledge boundaries. Escalates complex problems to the Remote Support Engineering staff or Field Engineering. Ensures customer satisfaction on all completed calls, or verifies that customer has alternative plan for problem resolution. Effectively communicate with members of management and technology support teams. Informs supervisors of any work conflicts, dissatisfied customers, or hardware/software malfunctions. To work in a shift pattern and be flexible. Key qualifications: Good command of oral Cantonese, capable to read and write English Have good IT knowledge & background (OS, Application software, Networks & IT infrastructure) Good in trouble shooting application problems Good communication skills and customer services oriented
-
Description The platform team is seeking an experienced Site Reliability Engineer (SRE) to meet rapid expansion of our business. You need to be highly sensitive to system reliability, and keen on identifying/resolving system risks to keep the system working well. In the platform team, you will be involved in provisioning, maintaining infrastructure, proposing solutions for the system, and working online with people from different countries. Responsibilities: • Participate in on-call duty to respond/investigate/resolve system incidents or handle support tickets for application teams. • Pay attention to alarms in the monitoring system, provide timely feedback, and solve problems. • Design, implement, and govern infrastructure to achieve high availability & scalability. • Evaluate and research technical initiatives with complete plans including documentation, provisioning, testing, and monitoring. • Construct service quality system, lead the team to complete indicator quantification. Required Skills and Qualifications: • Good English communication and writing skills, learning ability, and hands on skills. • Proficiency with Azure (Azure resources, network models, and best practices). • More than 2 years of experience in managing AKS/Kubernetes. • Familiar with Infrastructure as Code, Terraform preferred. • Familiar with CI/CD automation. • Familiar with observability technologies, like Prometheus, and Grafana. • Familiar with several of following middleware: Kafka, MySQL, Mongo, Elasticsearch, and Redis. Nice to Have: • CKA, CKAD Certificate is a plus. • Certificates related to Cloud Native/ Ops and Maintenance Qualifications is a plus. • Familiar with Java or Go.
-
Description The platform team is seeking an experienced Site Reliability Engineer (SRE) to meet rapid expansion of our business. You need to be highly sensitive to system reliability, and keen on identifying/resolving system risks to keep the system working well. In the platform team, you will be involved in provisioning, maintaining infrastructure, proposing solutions for the system, and working online with people from different countries. Responsibilities: • Participate in on-call duty to respond/investigate/resolve system incidents or handle support tickets for application teams. • Pay attention to alarms in the monitoring system, provide timely feedback, and solve problems. • Design, implement, and govern infrastructure to achieve high availability & scalability. • Evaluate and research technical initiatives with complete plans including documentation, provisioning, testing, and monitoring. • Construct service quality system, lead the team to complete indicator quantification. Required Skills and Qualifications: • Good English communication and writing skills, learning ability, and hands on skills. • Proficiency with Azure (Azure resources, network models, and best practices). • More than 2 years of experience in managing AKS/Kubernetes. • Familiar with Infrastructure as Code, Terraform preferred. • Familiar with CI/CD automation. • Familiar with observability technologies, like Prometheus, and Grafana. • Familiar with several of following middleware: Kafka, MySQL, Mongo, Elasticsearch, and Redis. Nice to Have: • CKA, CKAD Certificate is a plus. • Certificates related to Cloud Native/ Ops and Maintenance Qualifications is a plus. • Familiar with Java or Go.
-
DataVisor is the world’s leading AI-powered Fraud and Risk Platform that delivers the best overall detection coverage in industry. With an open SaaS platform that supports easy consolidation and enrichment of any data, DataVisor's solution scales infinitely and enables organizations to act on fast-evolving fraud and money laundering activities in real time. Its patented unsupervised machine learning technology, advanced device intelligence, powerful decision engine and investigation tools work together to provide guaranteed performance lift from day one. DataVisor's platform is architected to support multiple use cases across different business units flexibly, dramatically lowering total cost of ownership, compared to legacy point solutions. DataVisor is recognized as an industry leader and has been adopted by many Fortune 500 companies across the globe. Our award-winning software platform is powered by a team of world-class experts in big data, machine learning, security, and scalable infrastructure. Our culture is open, positive, collaborative, and results driven. Come join us! The software engineer - infrastructure team is the backbone of DataVisor. Without our distributed and highly robust systems, business would stop. We tackle important challenges: clients require sub-second response times while we find relationships in terabytes of data. We’ve created, and continually improve, our massive cluster infrastructure, allowing highly computationally expensive jobs to run smoothly. We love using and learning a 2+ years experience with infrastructure software engineering or production system operation experiences 2+ years experience with Amazon Web Services or other cloud service providers Strong understanding of the Linux environment and systems configurations Experience with Java and Python Experience with Kubernetes, docker Has basic knowledge about networking Open to work remotely Experience with Apache Spark, Cassandra, Flink Experience with Grafana, Loki, ELK
-
About the Role Epsilla is a leading RAGaaS platform for Generative AI, offering high-performance vector databases. As a Backend Engineer at Epsilla, you will build and enhance Epsilla’s product lines, improve workflow efficiency and scalability, integrate new AI developments into our platform, intergating Large Language Models(LLM) and manage our infrastructure. This dynamic role places you at the forefront of distributed systems infrastructure and Generative AI. About the Interview: 1. 15-minute behavioral interview, sharing your understanding of our platform(https://cloud.epsilla.com) 2. Take-home assessment 3. 45-minute technical discussion Responsibilities: - Writing well-designed, testable and efficient code - Collaborate with founders to conceptualize and implement sub-products and features from start to finish to deliver winning products. - Participate in the design of technical architecture, ensuring scalability and maintainability. - Troubleshooting and debugging to drive performance optimization. - Engage with customers and evaluating user feedback to understand core pain points and iterate on feature requests. - Contribute to customer support, particularly for the products you develop. - Stay current with the latest AI advancements by investigating alternatives to integrate new capabilities into our platform. Requirements: - Degree/Diploma in Computer Science, Engineering or related field. - Proficiency in C++, TypeScript, and Python/FastAPI. - Understanding of performance optimization best practices. - Strong attention to detail and deliver work that is of a high standard. - Strong communication skills and the ability to work in fast-paced environments. - Excellent problem-solving skills with a proven ability to overcome challenges. Nice to Have: - Experience in AI, Machine Learning, and building RAG workflows. - Experience with no/low-code platforms. - Experience in early-stage startup environments.