Summary of experience in building Internet high-performance WEB system

Release time: 2019-08-02 14:15:24 Author: Hua Jingyu I want to comment
This article mainly introduces the construction of Internet high-performance WEB system, summarizes and analyzes the construction of high-performance WEB system related experience and common technologies, need friends can refer to

Since the development of the Internet, various applications have emerged in an endless stream, and the number of users is often hundreds of millions. So how to build an excellent high-performance, highly reliable application system is crucial to every developer. This paper summarizes what I have learned and some methods used in my work, hoping to provide some reference to other students, so that they can quickly find solutions when they encounter similar problems in the future development. I mainly use the language is JAVA, so the following do not make special instructions, are the use of JAVA language

Key to high performance

To achieve high performance, I summarized three points:

  1. cache
    • DNS cache
    • Database cache
    • Distributed cache
  2. disassemble
    • Business split
    • Database splitting
  3. asynchronous
    • Network asynchronous
    • Disk asynchronous
    • Use message

Here are some of the three common scenarios, and no matter where performance bottlenecks are encountered, with these three points in mind, a solution will be found most of the time. The following describes the application of these three points in various aspects of the overall architecture

Stateless service

Stateless objects can be simply understood as objects without Field. For example, model/entity object does not belong to a stateless object, because it contains a Field. For example, **Controller in a typical MVC scenario, **Service is stateless. They only contain method. Some are stateful, such as the Structs2 framework's Action, so Structs2 is used less these days. With stateless objects, it is possible to build stateless services, because the request link does not contain stateful objects, so each of our requests is independent, such an architecture helps our service to expand.

Stateless services sometimes inevitably encounter stateful objects, the most common being sessions. Because http requests are stateless, cookies and sessions must be used together to recognize that multiple http requests belong to the same user. There are generally two ways to solve it:

  • Store using cookies
  • Use the distributed session service

The first is to store all the object information in the cookie, and read the information in the cookie at the server side through the corresponding algorithm. This information is usually encrypted. The second method is to store the session in a distributed database or distributed cache, usually in redis or memcache. This extension of the service would rely on third-party database or caching capabilities. Taobao has similar components, and the open source world has distributed sessions based on memcache and redis

Stateless services use splitting and caching

Business split

Statelessness allows application service levels to scale, but when a single application becomes too large and bloated, it is necessary to split the application. Vertical splitting is divided by business, such as in the e-commerce system, according to the order system, the integral system, etc. Split can be easy to develop, more easy to expand. After the system is large, the traffic of each business is not the same, such as the buyer system is certainly much larger than the seller system, at this time you can only increase the machine of the buyer system.

In addition to being divided into different systems according to different businesses, our application layer can also be split, generally divided into application layer, logical layer and atomic layer. The application layer is the assembly of various data and logical businesses. The logical layer contains a lot of reusable logic. The atomic layer directly operates the database and some basic data operations are included in it.

Regardless of the form of the split, the split system is separated on the physical level, so the communication between the systems is the most important problem in the split.

RPC

There have been many methods of system communication before RPC services, such as RMI and WebService, but RPC is now the mainstream means of communication in a more convenient, efficient, cross-platform way. Almost every big company has its own RPC framework: Taobao's HSF, 58 SCF, and there are many excellent open source frameworks: Dubbo, GRPC, Thrift, and so on. There are also many large companies that use dubbo in China: Jingdong and Dangdang.

MQ

RPC calls are generally used in the case of heavy coupling and synchronous calls. MQ, as another means of asynchronous communication, is also widely used in various businesses. Commonly used are: ActiveMQ, RabbitMQ, Kafka, RocketMQ. The first two are generally enterprise-class applications that support a wide range of features and specifications. The latter two are Internet-grade, with more powerful throughput and higher performance, but at the expense of many MQ features. mq is generally used in the scenario that requires the final consistency, such as user registration and the two actions of sending credits, users can directly return to the foreground success after registration, and then send a successful registration message to the mq system, send credits to subscribe to register events, and consume mq event information.

The biggest advantage of MQ is peak clipping and decoupling. In the RPC-style synchronous call scenario, if A and B are called in the same logic, then A and B must be extended at the same time, but after A message is sent, A sends a message to B, and B can't handle it temporarily, and B can continue to process it after A's peak. It does not matter if B cannot match A's ability to send messages in the short term.

Database splitting

Generally, projects will experience changes in the amount of data from small to large, so the database splitting is also processed according to different amounts of data at different stages.

Read/write separation is the first thing most applications do when they encounter performance bottlenecks. Most Internet applications are scenarios where reading occupies more than 90% of the traffic. Therefore, one master has more slaves, one master does writing, and the other slaves do reading. However, this master-slave mode also has some problems, such as some data needs to be relatively timely, that is, it needs to be read immediately after writing. Because the master/slave synchronization is asynchronous replication through log, there is a data inconsistency window. In this case, you must forcibly read the master database to ensure data security. Pay attention to this during development.

Vertical segmentation is the splitting of different businesses into different databases, which can reduce the pressure on a single database and improve overall performance. Vertical partitioning is concerned with the business boundary problem, which is having A table that feels appropriate in both A and B libraries. This depends on experience, can not be considered too much, because in fact, no matter how good you are in the previous section, in the iteration of the application, there will always be more tables that can not find clear boundaries. The same is true for business module partitioning.

Horizontal segmentation is generally known as sharding. The fields in the same table can be divided into different tables, or the same table can be divided into different shards based on hash or service fields. This generally requires the support of DAL frameworks, including TDDL, Cobar, Mycat, etc. The main purpose is to make the splitting of the database invisible to the programmer through the framework, as if manipulating a database. However, the current DAL framework is not able to achieve this purpose, especially in the case of cross-library transactions, which generally need to be handled in other ways.

Cross-library transactions/distributed transactions

Cross-library transactions are generally resolved through ultimate consistency, that is, there is a time window for data inconsistency that is not required to satisfy ACID, but there is always a point in time at which the data will reach a final consistent state. There are many solutions, but the core principle is the same, nothing more than compensation to complete.

Cache usage

There is a famous saying in the computer world that "any problem in computer science can be solved by adding an indirect intermediate layer". Caching is a kind of intermediate layer.

The use of caching is very, very numerous, almost everywhere you can imagine. Here we talk about the usual database data caching

There are two kinds of cache, local and remote, generally speaking, a cache can be used, because the cache is good, but maintaining the update and deletion of the cache is a very troublesome thing. General caches can be divided into read caches (most scenarios) and write caches (generally for scenarios with low data security).

For example, when the data in the database is read out and written to the cache at the same time, the next time you read the data, you can directly read the data in the cache, thus greatly reducing the pressure on the database, it is very simple to say, in fact, there are many kinds of architectures, each architecture has advantages and disadvantages, we can understand in detail.

Write cache, that is, first write data to the cache, and then persist for a period of time, this will also improve efficiency, the problem of this solution is that if the downtime, some data will be lost, so it is suitable for low data security scenarios.

Although the cache is fast, in addition to the maintenance of the update is more troublesome, memory is also more expensive hardware, so in addition to the hot data stored in the cache, the general cache to maintain the data index or the main field for the list display, the real large and complete data also need other methods to solve.

staticize

For most scenarios, our data will not change at a certain time, or even if it changes, only a small part of the page will change, you can take out the unchanged part of the static. For example, the page of Jingdong Mall is static, after static, the data does not have to be obtained from the cache or database every time, and then packaged into a page, but directly request to return the static page, the performance is undoubtedly greatly improved.

In addition to the above commonly used methods, there are many important methods:

  • CDN acceleration
  • DNS cache
  • Page caching
  • Use distributed storage
  • Write programs using multiple threads
  • Tag: Internet WEB system

Related article

Latest comments