Internet is a network of networks. World Wide Web is a system, a means of accessing information on the Internet.
互联网是由多个网络相互连接而成的网络。万维网则是一个系统，一种在互联网上访问信息的手段。

WWW makes use of hypertext (HTTP) to access this information.
万维网利用超文本（HTTP）来访问这些信息。

Internet is the global network of networks of computers. Internet is networks of computers, cables and wireless connections, which governed by Internet Protocol (IP), which deals with data and packets.World Wide Web, also known as the Web, is one set of software running on the Internet. Web is a collection of web pages, files and folders connected through hyperlinks and URLs.Internet is the hardware part, and Web is the software part. Therefore, Web relies the Internet to run, but not vice-versa. In addition to WWW other examples would include VoIP and Mail which have their own protocols and run on the internet.
互联网是全球性的计算机网络网络。互联网是由计算机、电缆和无线连接组成的网络，受互联网协议（IP）管理，该协议处理数据和数据包。万维网（World Wide Web），也称为 Web，是运行在互联网上的一组软件。Web 是通过超链接和 URL 连接的网页、文件和文件夹的集合。互联网是硬件部分，而 Web 是软件部分。因此，Web 依赖互联网运行，但反之则不然。除了 WWW 之外，其他示例还包括 VoIP 和 Mail，它们拥有自己的协议并在互联网上运行。

C.1.2 Describe how the web is constantly evolving
C.1.2 描述网络如何不断演变

C.1.3 Identify the characteristics of the following:
C.1.3 识别以下各项的特征：

Identify the characteristics of the following HTTP HTTPS:
识别以下 HTTP 和 HTTPS 的特性：

hypertext transfer protocol (HTTP)
超文本传输协议（HTTP）

HTTP is the set of rules for transferring files on the world wide web. HTTP is the underlying protocol used by the world wide web. Defines how messages are formatted/transmitted and the actions web servers, and browsers should take in response to commands.
HTTP 是万维网上用于传输文件的规则集。HTTP 是万维网使用的底层协议，定义了消息的格式化/传输方式，以及网络服务器和浏览器应采取的响应命令的操作。

Application layer protocol from the Internet Protocol suite to transfer and exchange hypermedia
应用层协议，属于互联网协议套件，用于传输和交换超媒体
Request-response protocol based on client-server model
基于客户端-服务器模型的请求-响应协议
user agent (e.g. web browser) requests some resource from a server through an URL, and the web server gives and response
用户代理（如 web browser）通过 URL 向服务器请求某些资源，网络服务器给出响应
different HTTP request methods, e.g. for retrieving or submitting data (GET and POST)
不同的 HTTP 请求方法，例如用于检索或提交数据（GET 和 POST）

hypertext transfer protocol (HTTPS)
超文本传输协议（HTTPS）

Based on HTTP 基于 HTTP
Adds an additional security layer of SSL or TLS
添加一个额外的 SSL 或 TLS 安全层
ensures authentication of website by using digital certificates
通过使用数字证书确保网站的身份验证
ensures integrity and confidentiality through encryption of communication
通过通信加密确保完整性和保密性

A secure version of HTTP also making use of the SSL technology to authenticate website using digital certificates. Uses encryption to ensure integrity and confidentiality of data
HTTP 的安全版本同样利用 SSL 技术，通过数字证书验证网站身份。采用加密手段确保数据的完整性和机密性。

Identify the characteristics of the following HTML URL XML XLST
识别以下各项的特征：HTML URL XML XLST

hypertext mark-up language (HTML)
超文本标记语言（HTML）

standard language for web pages
网页标准语言
uses elements enclosed by tags to markup a document
使用由标签包围的元素来标记文档

Characteristics: 特点：

consists of tags 由标签组成
tags are used to describe structure of website
标签用于描述网站的结构

URL -- Uniform Resource Locator
URL —— Uniform Resource Locator

unique string that identifies a web resource
标识网络资源的唯一字符串
reference to a web resource
对网络资源的引用
Also specifies the protocol to be used to access this resource
同时指定用于访问该资源的协议
First part is protocol identifier (e.g. http or https), second part of resource identifier (specifies location on Web), third is path name
第一部分是协议标识符（如 http 或 https），第二部分是资源标识符（指定资源在 Web 上的位置），第三部分是路径名称。
URLs may have query parameters
URL 可能有查询参数

https://www.google.co.uk/search?q=fleet+insurance&gl=uk&ei=6M5IXZvWHuffz7sP4auykAY&start=10&sa=N&ved=0ahUKEwib6qy5g-3jAhXn73MBHeGVDGIQ8NMDCIQC&biw=1920&bih=888

Extensible mark-up language (XML)
可扩展标记语言（XML）

markup language with a set of rules defining how to encode a document
一种标记语言，包含定义如何对文档进行编码的一组规则
human-readable 人类可读
similar to HTML in using tags
类似于 HTML，使用标签
used for representation of arbitrary data structures
用于表示任意数据结构
XML is used to describe data.
XML 用于描述数据。
Allows designers to create their own customised tags
允许设计师创建自己的自定义标签

XML is a meta language, that gives meaning to data that other applications can use. extensible to accommodate new tags other than those already in HTML’s set
XML 是一种元语言，能够赋予数据意义供其他应用程序使用，并可扩展以容纳超出 HTML 现有标签集之外的新标签。

XLST – Extensible stylesheet language Transformations
XLST——可扩展样式表语言转换

XSLT is a standard way to describe how to change the structure of an XML document into a XML document with a different structure. It is usually used to convert XML into HTML However, it can also convert it into another type of document that is recognized by the browser.
XSLT 是一种标准方式，用于描述如何将 XML 文档的结构更改为具有不同结构的 XML 文档。它通常用于将 XML 转换为 HTML，但也可以将其转换为浏览器可识别的另一种类型的文档。

XSL (eXtensible Stylesheet Language) is a styling language for XML.
XSL（eXtensible Stylesheet Language）是一种用于 XML 的样式表语言。

XSLT stands for XSL Transformations.
XSLT 代表 XSL Transformations。

used to transform XML document into various other types of document
用于将 XML 文档转换为各种其他类型的文档

Identify the characteristics of the following JavaScript , CSS
识别以下 JavaScript、CSS 的特性

JavaScript

Javascript is a scripting language that enables users to make interactive sites. Using javascript you can interact with the HTML source code which can help make dynamic contents.
Javascript 是一种脚本语言，使用户能够创建交互式网站。使用 Javascript 可以与 HTML 源代码交互，从而有助于生成动态内容。

interpreted programming language
解释型编程语言

core technology of most websites with HTML and CSS
大多数网站的核心技术基于 HTML 和 CSS

high-level, dynamic and untyped; therefore relatively easy for beginners
高级、动态且无类型；因此对初学者相对容易

allows to dynamically manipulate the content of HTML documents
允许动态操作 HTML 文档的内容

CSS – Cascading style sheet
CSS – 层叠样式表

style sheet language to describe the presentation of a mark-up document, usually HTML
用于描述标记文档（通常是 HTML）呈现方式的样式表语言

used to create better designed websites
用于创建设计更佳的网站

intended to separate content in presentation in HTML and CSS
旨在通过 HTML 和 CSS 实现内容与表现的分离

it uses selectors to describe particular elements of a document, and gives these properties that define things ranging from font color to page position
它使用选择器（selectors）来描述文档中的特定元素，并为这些元素赋予属性（properties），这些属性定义了从字体颜色到页面位置的各种样式特征

C.1.4 Identify the characteristics of the following URI URL:
C.1.4 识别以下 URI URL 的特征：

uniform resource identifier (URI)
统一资源标识符（URI）

URL.

The power of a link in the Web is that it can point to any document (or, more generally, resource) of any kind in the universe of information. This requires a global space of identifiers. These Universal Resource Identifiers are the primary element of Web architecture.
Web 中链接的强大之处在于，它能指向信息宇宙中任何类型的文档（或更广义的资源）。这需要一个全局标识符空间，这些通用资源标识符构成了 Web 架构的主要元素。

A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. Basically it is a term for an identifier of a resource. It identifies a ‘thing’, but does not give any information about where to find it.
统一资源标识符(URI)是标识抽象或物理资源的紧凑字符序列。基本上它是资源标识符的术语。它标识一个"事物"，但不提供任何关于如何定位该事物的信息。

A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. Basically it is a term for an identifier of a resource. It identifies a ‘thing’, but does not give any information about where to find it.
统一资源标识符 (URI) 是一个用于标识抽象或物理资源的紧凑字符序列。本质上，它是资源标识符的术语。它标识一个"事物"，但不提供关于如何找到该事物的任何信息。

A URI can be further classified as a locator, a name, or both. The term “Uniform Resource Locator” (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network “location”)
URI 可进一步分类为定位符（locator）、名称（name）或两者兼具。术语"统一资源定位符"（URL）特指 URI 的子集，这类标识符不仅能够识别资源，还能通过描述其主要访问机制（例如其网络"位置"）提供定位该资源的方法。

URL Is the global address of documents and pages/resources on the world wide web
URL 是万维网上文档和页面/资源的全球地址
Difference: URI only identifies the network resource, URL helps locate that resource as well and also defines the mechanism as to how to retrieve the resource over the web
区别：URI 仅标识网络资源，URL 不仅帮助定位该资源，还定义了如何通过 web 检索该资源的机制

Note: network resources are files that can be plain Web pages, other text documents, graphics, or programs.
注意：网络资源可以是普通网页、其他文本文档、图形或程序文件。

The first part of a protocol is called a protocol identifier and indicates to the browser what protocol to use. The second part is called resource name and it specifies the ip address/domain name of where the resource is located on the world wide web.
协议的第一部分称为协议标识符(protocol identifier)，用于向浏览器指示应使用哪种协议。第二部分称为资源名称(resource name)，它指定了资源在万维网(world wide web)上所处位置的 IP 地址/域名(ip address/domain name)。

C.1.5 Describe the purpose of a URL
C.1.5 描述 URL 的用途

From above, URL is the global address of documents and pages/resources on the world wide web. The first part of a protocol is called a protocol identifier and indicates to the browser what protocol to use. The second part is called resource name and it specifies the ip address/domain name of where the resource is located on the world wide web.
如上所述，URL 是全球万维网上文档和页面/资源的统一地址。协议的第一部分称为 protocol identifier（协议标识符），用于指示浏览器使用何种协议。第二部分称为 resource name（资源名称），指定了资源在万维网上所处位置的 ip address（IP 地址）/domain name（域名）。

The purpose of URL is to tell the server which webpage to display or to search for.
URL 的用途是告诉服务器要显示哪个网页或进行搜索。

The URL contains the name of the protocol to be used to access a file resource
URL 中包含要用于访问文件资源的协议名称

A URL (Uniform Resource Locator), as the name suggests, provides a way to locate a resource on the web
URL（Uniform Resource Locator），顾名思义，提供了一种在网络上定位资源的方法

C.1.6 Describe how a domain name server functions
C.1.6 描述域名服务器如何运作

Describe how a domain name server functions
描述域名服务器如何运作

DNS allows you to type names into your Web browser like google.com and turns it into an Internet Protocol (IP) address like 70.42.251.42 that computers use to identify each other on the network
DNS 允许您在网络浏览器中输入诸如 google.com 之类的名称，并将其转换为互联网协议（IP）地址（如 70.42.251.42），供计算机在网络中相互识别。

Maps domain names into IP Addresses
将域名映射为 IP 地址
User/Resolver sends request to server by giving a domain name and asking for an IP Address
用户/解析器通过向服务器提供域名并请求 IP 地址来发送请求
Local DNS Server tries to resolve the query
Local DNS Server 尝试解析查询
If it cannot, it asks other servers
如果无法做到，它会向其他服务器查询

C.1.7 Identify the characteristics of: IP, TCP and FTP
C.1.7 识别以下协议的特征：IP、TCP 和 FTP

Identify the characteristics of: IP, TCP and FTP
识别以下协议的特征：IP、TCP 和 FTP

internet protocol (IP) 网际协议（IP）

Internet Protocol (IP) Is responsible for letting your machine know where a specific packet is going
互联网协议（IP）负责让你的机器知道特定数据包的去向

It is responsible for adding the header to data packets
其职责是向数据包添加头部
These headers contain the IP Addresses of the sender and receiver
这些头部包含发送方和接收方的 IP 地址
As well as the routing information for relaying of packets
以及用于数据包中继的路由信息

transmission control protocol (TCP)
传输控制协议（TCP）

Transmission control protocol (TCP) breaks down data into packets known as TCP segments.
传输控制协议（TCP）将数据分解为称为 TCP 段的数据包。

Responsible for error checking of packets, checking if any packet needs to be resent, also responsible for reassembling the packets
负责数据包的差错检查，检查是否有需要重传的数据包，并负责重新组装数据包

File transfer protocol (FTP)
文件传输协议（FTP）

File Transfer Protocol (FTP) FTP Clients transfer initiates a connection with a remote computer running FTP server software. After that the client can chose to either send and or receive files. Clients identify FTP servers by its IP address or by its host name.
文件传输协议（FTP） FTP 客户端启动与运行 FTP 服务器软件的远程计算机的连接。之后，客户端可以选择发送和/或接收文件。客户端通过其 IP 地址或主机名识别 FTP 服务器。

Used for transferring files from one computer to another using a FTP Server software
用于使用 FTP 服务器软件将文件从一台计算机传输到另一台计算机

C.1.8 Outline the different components of a web page.
C.1.8 概述网页的不同组件。

Outline the different components of a web page.
概述网页的不同组成部分。

A web page can contain a variety of components. The basics structure of a HTML document is:
一个网页可以包含多种组件。HTML 文档的基本结构是：

`head`

This is not visible on the page itself, but contains important information about it in form of metadata.
这在页面本身不可见，但以元数据的形式包含有关它的重要信息。

`title`

The title goes inside the head and is usually displayed in the window top of the web browser.
标题位于 head 部分，通常显示在网页浏览器的窗口顶部。

`meta` tags `meta` 标签

There are various types of meta tags, which can give search engines information about the page, but are also used for other purposes, such as to specify the charset used.
元标签有多种类型，既可为搜索引擎提供页面信息，也可用于其他用途，例如指定所使用的字符集。

`body`

The main part of the page document. This is where all the (visible) content goes in.
页面文档的主体部分。所有（可见的）内容都放置在此处。

Some other typical components:
其他一些典型组件：

Navigation bar 导航栏

Usually a collection of links that helps to navigate the website top of page or as hamburger on mobile.
通常是位于页面顶部的链接集合，用于网站导航，或在移动端显示为汉堡菜单。

Hyperlinks 超链接

A hyperlink is a reference to another web page.
超链接是指向另一个网页的链接。

Table Of Contents 目录

Might be contained in a sidebar and is used for navigation and orientation within the website.
可能位于侧边栏中，用于网站内的导航和定位。

Banner 横幅

Area at the top of a web page linking to other big topic areas.
网页顶部的区域，链接到其他主要主题区域。

Sidebar 侧边栏

Usually used for a table of contents or navigation bar.
通常用于目录或导航栏。

C.1.9 Explain the importance of protocols and standards on the web.
C.1.9 解释协议和标准在网络上的重要性。

C.1.9 Explain the importance of protocols and standards on the web.
C.1.9 解释网络协议和标准的重要性。

Protocols are a set of rules and procedures that the sender and receiver must adhere to to ensure coherent data transfer. Examples are TCP, IP, HTTP and FTP.
协议是发送方和接收方必须遵守的一套规则和程序，以确保数据传输的连贯性。例如 TCP、IP、HTTP 和 FTP。

A standard is an anything that has been agreed upon
标准是经各方一致认可的任何事物

C.1.10 Describe the different types of web page
C.1.10 描述不同类型的网页

Describe the different types of web page
描述不同类型的网页

Personal Pages: A personal web page is created by an individual for his/her own personal need.
个人主页：个人网页是由个人出于自身需求而创建的。

Blogs: Blogs are basically discussion or informational sites where others can leave comments/input
博客：博客本质上是允许访客留言/提供反馈的讨论型或信息发布型网站

Search Engine Pages: A search engine results page (SERP) is the listing of results returned by a search engine in response to a keyword query. The results normally include a list of items with titles, a reference to the full version, and a short description showing where the keywords have matched content within the page.
搜索引擎页面：搜索引擎结果页面（SERP）是搜索引擎针对关键词查询返回的结果列表。这些结果通常包含带标题的条目列表、完整版本的引用链接，以及显示关键词在页面内容中匹配位置的简短描述。

Forums: a place, meeting, or medium where ideas and views on a particular issue can be exchanged.
论坛：一个场所、会议或媒介，在此可以交换针对特定问题的观点和看法。

Wiki example of a collective intelligence project
Wiki 上的一个集体智慧项目示例

C.1.11 Explain the differences between a static web page and a dynamic web page
C.1.11 解释静态网页和动态网页之间的区别

C.1.11 Explain the differences between a static web page and a dynamic web page
C.1.11 解释静态网页与动态网页之间的区别

HTML can produce only static pages, static pages look the same and behave in the same manner each time they are loaded into a browser( hey do not change based on user interaction)
HTML 只能生成静态页面，静态页面每次加载到浏览器中时外观和行为始终保持一致（它们不会根据用户交互而改变）

Web pages with JavaScript can change their appearance: over time (e.g., a different image each time that a page is loaded), or in response to a user’s actions (e.g., typing, mouse clicks, and other input methods) A request to view a page will Dynamically populate the different sections of the site according to a template file
含有 JavaScript 的网页可以改变其外观：随时间变化（例如，每次加载页面时显示不同的图像），或响应用户操作（如键入、鼠标点击及其他输入方式）。查看页面的请求将根据模板文件动态填充网站的不同部分。

Dynamic web page is that kind of web page, which is able to show different content and materials to its viewer whenever visited by user. It can change based on the interactions with the users. Client side scripting and server side scripting are two kinds of dynamic web pages. In client side scripting web pages change according to your action in web page. In this system you can download the content and after modifying it can upload the same. In server side scripting web pages changes whenever a web page is loaded. Examples includes login & sign up pages, application & submission forums, inquiry and shopping carts pages. Dynamic web pages are created by the using different internet languages like PHP and JAVAScript
动态网页是一种能够在用户每次访问时向其展示不同内容和材料的网页类型。它可以根据与用户的交互行为发生变化。客户端脚本和服务器端脚本是动态网页的两种实现方式：客户端脚本网页会根据用户在页面上的操作实时更新内容，这类系统允许用户下载内容、修改后重新上传；服务器端脚本网页则会在每次页面加载时动态生成内容，典型应用包括登录注册页面、表单提交界面、查询系统及购物车页面。动态网页主要通过 PHP、JAVAScript 等不同网络编程语言实现。

C.1.12 Explain the functions of a browser
C.1.12 解释浏览器的功能

Explain the functions of a browser
解释浏览器的功能

Allow user to enter url to request a resource
允许用户输入 URL 以请求资源
Serves the document in readable format to the user
以可读格式向用户提供文档
Most will allow o to keep history back page and bookmark, add extra tools
大多数允许用户保留历史记录、后退页面及书签，并添加额外工具

C.1.13 Evaluate the use of client-side scripting and server-side scripting in web pages the functions of a browser
C.1.13 评估客户端脚本与服务器端脚本在网页中的应用以及浏览器的功能

Client-side Environment 客户端环境

The client-side environment used to run scripts is usually a browser. The processing takes place on the end users computer. The source code is transferred from the web server to the users computer over the internet and run directly in the browser.
用于运行脚本的客户端环境通常是浏览器。处理过程发生在最终用户的计算机上。源代码通过互联网从网络服务器传输到用户的计算机，并直接在浏览器中运行。

The scripting language needs to be enabled on the client computer. Sometimes if a user is conscious of security risks they may switch the scripting facility off. When this is the case a message usually pops up to alert the user when script is attempting to run.
必须在客户端计算机上启用脚本语言。有时，如果用户意识到安全风险，他们可能会关闭脚本功能。当出现这种情况时，通常会弹出一条消息，在脚本试图运行时提醒用户。

Server-side Environment 服务器端环境

The server-side environment that runs a scripting language is a web server. A user's request is fulfilled by running a script directly on the web server to generate dynamic HTML pages. This HTML is then sent to the client browser. It is usually used to provide interactive web sites that interface to databases or other data stores on the server.
运行脚本语言的服务器端环境是 web 服务器。用户的请求通过直接在 web 服务器上运行脚本来生成动态 HTML 页面，从而得到满足。该 HTML 随后被发送至客户端浏览器。该技术通常用于提供与服务器端数据库或其他数据存储进行交互的交互式网站。

This is different from client-side scripting where scripts are run by the viewing web browser, usually in JavaScript. The primary advantage to server-side scripting is the ability to highly customize the response based on the user's requirements, access rights, or queries into data stores.
这与客户端脚本不同，客户端脚本通常由查看网页的浏览器（使用 JavaScript）运行。服务器端脚本的主要优势在于能够根据用户需求、访问权限或对数据存储的查询，高度定制响应内容。

C.1.14 Describe how web pages can be connected to underlying data sources
C.1.14 描述网页如何连接到底层数据源

HTML are markup languages, basically they are set of tags like <html>, <body>, which is used to present a website using css, and javascript as a whole. All these, happen in the clients system or the user you will be browsing the website.
HTML 是标记语言，本质上是一组标签，如、，用于结合 css 和 javascript 共同呈现网站。所有这些都发生在客户端系统或用户浏览网站时。

Now, Connecting to a database, happens on whole another level. It happens on server, which is where the website is hosted.
现在，连接到数据库（database）发生在完全不同的层面。这发生在服务器（server）上，也就是网站托管的地方。

So, in order to connect to the database and perform various data related actions, you have to use server-side scripts, like php, jsp, asp.net etc.
因此，为了连接到数据库并执行各种与数据相关的操作，您必须使用服务器端脚本，如 php、jsp、asp.net 等。

Now, lets see a snippet of connection using MYSQL Extension of PHP
现在，让我们来看一段使用 PHP 的 MYSQL 扩展进行连接的代码片段

$db= mysqli_connect('hostname','username','password','databasename');

This single line code, is enough to get you started, you can mix such code, combined with HTML tags to create a HTML page, which is show data based pages. For example:
这行单行代码足以让你入门，你可以将此类代码与 HTML 标签结合使用，创建出能够展示数据驱动型页面的 HTML 页面。例如：

<?php

$db= mysqli_connect('hostname','username','password','databasename');

?> 《选项 C 网络科学 - 计算机科学初高中课程》

<html> ...

<body>

<?php

$query="SELECT * FROM `mytable`;"
$query="SELECT * FROM `mytable`;";;

$result= mysqli_query($db,$query);
$result = mysqli_query($db, $query);

while($row= mysqli_fetch_assoc($result)){
while($row = mysqli_fetch_assoc($result)){

// Display your datas on the page
// 在页面上显示您的数据

}

</body>

</html>

C.1.15 Describe the function of the common gateway interface (CGI)
C.1.15 描述公共网关接口（CGI）的功能

CGI is a method used to exchange data between the server and the web browser. CGI is a set of standards where a program or script can send data back to the web server where it can be processed.
CGI 是一种用于在服务器和网络浏览器之间交换数据的方法。CGI 是一套标准，允许程序或脚本将数据发送回网络服务器进行处理。

Common Gateway Interface (CGI) offers a standard protocol for web servers to execute programs that execute like Console applications (also called Command-line interface programs) running on a server that generates web pages dynamically. Such programs are known as CGI scripts or simply as CGIs. The specifics of how the script is executed by the server are determined by the server. In the common case, a CGI script executes at the time a request is made and generates HTML.
通用网关接口（Common Gateway Interface，CGI）为 Web 服务器提供了一种标准协议，使其能够执行类似于在服务器上运行的命令行界面程序（即控制台应用程序），从而动态生成网页。这类程序被称为 CGI 脚本或简称为 CGI。服务器具体如何执行脚本由其自身决定。通常情况下，CGI 脚本会在收到请求时立即执行并生成 HTML 内容。

CGI is the part of the Web server that can communicate with other programs running on the server. With CGI, the Web server can call up a program, while passing user-specific data to the program (such as what host the user is connecting from, or input the user has supplied using HTML form syntax). The program then processes that data and the server passes the program's response back to the Web browser.
CGI 是 Web 服务器的一部分，能够与服务器上运行的其他程序进行通信。通过 CGI，Web 服务器可以调用某个程序，同时将用户特定的数据传递给该程序（例如用户连接的主机信息，或用户通过 HTML 表单语法输入的参数）。该程序处理完数据后，服务器会将程序的响应结果传回给 Web 浏览器。

C.1.16 Evaluate the structure of different types of web pages (examples seen in past paper include blogs, forums, etc.)
C.1.16 评估不同类型网页的结构（历年试卷中的示例包括 blogs、forums 等）

Past paper Questions 历年真题

Describe how the web is constantly evolving
阐述 Web 是如何持续演进的

The beginnings of the web (Web 1.0 , Web of content)
万维网的起源（Web 1.0，内容网络）

The world wide web started around 1990/91 as a system of servers connected over the internet that deliver static documents, which are formatted as hypertext markup language (HTML) files, which support links to other documents, but also multimedia as graphics, video or audio. In the beginnings of the web, these documents consisted mainly of static information and text, where multimedia were added later. Some experts describe this as a “read-only web”, because users mostly searched and read information, while there was little user interaction or content contribution.
万维网始于 1990/91 年左右，当时是作为通过互联网连接的服务器系统，用于传输静态文档，这些文档格式化为超文本标记语言（HTML）文件，支持链接到其他文档，同时也支持图形、视频或音频等多媒体内容。在互联网初期，这些文档主要由静态信息和文本构成，多媒体内容后来才逐渐加入。一些专家将此描述为"只读网络"，因为用户主要进行信息搜索和阅读，而用户互动或内容贡献极少。

Web 2.0 – “Web of the Users”
Web 2.0 —— "用户之网"

However, the web started to evolve into the delivery of more dynamic documents, enabling user interaction or even allowing content contribution. The appearance of blogging platforms as Blogger in 1999 gives a time mark for the birth of the Web 2.0. Continuing the model from before, this would be the evolution to a “read-write” web. This opened new possibilities and lead to new concept as blogs, social networks or video-streaming platforms. Web 2.0 might also be looked at from the perspective of the websites themselves evolving in more dynamic and feature-rich. For instance, improved design, JavaScript and dynamic content loading could be considered Web 2.0 features.
然而，网络开始向更具动态性的文档交付演变，实现了用户交互甚至内容贡献。1999 年 Blogger 等博客平台的出现，标志着 Web 2.0 的诞生。延续之前的模式，这标志着网络向"读写"模式的进化。这开启了新的可能性，催生了博客、社交网络或视频流媒体平台等新概念。从网站自身进化的角度来看，Web 2.0 也可以被视作网站变得更加动态化和功能丰富化。例如改进的设计、JavaScript 和动态内容加载等技术都可视为 Web 2.0 的特征。

Web 3.0 – “Semantic Web”
Web 3.0——"语义网"

The internet and thus the world wide web is constantly developing and evolving into new directions and while the changes described for the Web 2.0 are clear to us today, the definition for the Web 3.0 is not definitive yet. Continuing the read to read-write description form earlier, it might be argued that the Web 3.0 would be the “read-write-execute” web. One interpretation of this, is that the web enables software agents to work with documents by using semantic markup. This allows for smarter searches and the presentation of relevant data fitting into context. This is why Web 3.0 is sometimes called the semantic executive web.
互联网以及万维网正在不断发展和演变，朝着新的方向前进。尽管如今我们对 Web 2.0 所描述的变革已了然于心，但 Web 3.0 的定义尚未最终确定。延续之前从"只读"到"读写"的描述形式，可以说 Web 3.0 将是"读写-执行"网络。对此的一种解释是，网络通过使用语义标记（semantic markup）使软件代理（software agents）能够处理文档。这使得搜索更加智能化，并能呈现与上下文相关的数据。这就是为什么 Web 3.0 有时被称为语义执行网络（semantic executive web）。

But what does this mean?
但这意味着什么？

It’s about user input becoming more meaningful, more semantic, by users giving tags or other kinds of data to their document, that allow software agents to work with the input, e.g. to make it more searchable. The idea is to be able to better connect information that is semantically connected.
其核心在于通过用户为文档添加标签或其他类型的数据，使输入内容更具意义和语义价值，从而让软件代理能够有效处理这些输入，例如提升可搜索性。该理念旨在更好地连接具有语义关联的信息。

Later developments 后续发展

However, it might also be argued that the Web 3.0 is what some people call the Internet of Things, which is basically connecting every day devices to the internet to make them smarter. In some way, this also fits the read-write-execute model, as it allows the user to control a real life action on a device over the internet. Either way, the web keeps evolving and the following image provides a good overview and an idea where the web is heading to.
然而，也有人认为 Web 3.0 是某些人所说的物联网（Internet of Things），其本质是通过将日常设备连接到互联网使其更加智能化。从某种角度来看，这也符合读写执行模型，因为它允许用户通过互联网控制现实设备上的真实操作。无论哪种方式，网络都在持续进化，下图很好地概述了网络的发展方向。

Describe and compare web 1.0, 2.0 3.0
描述并比较 Web 1.0、2.0 与 3.0

Web 1.0 is static web pages.
Web 1.0 是静态网页。

Web 2.0 is dynamic web pages that are driven by user created content, such as Facebook and Youtube.
Web 2.0 是由用户生成的内容（如 Facebook 和 Youtube）驱动的动态网页。

Web 3.0 is not clearly defined, but is the idea that the web will become more omnipresent (internet enabled phones, fridges, cars, etc..) and intelligent (your email will alert you to a conflict in your calendar with an event that you describe in your email and suggest alternative dates). There are 3 ways this might happen:
Web 3.0 尚无明确定义，但其核心理念是网络将变得更加普适化（通过互联网手机、冰箱、汽车等设备实现万物互联）和智能化（例如电子邮件会主动识别你在邮件中描述的日程安排与日历冲突，并自动推荐替代日期）。这一愿景可能通过三种方式实现：

1) Expanded application programming interfaces (APIs) from websites such as Facebook, that will allow this kind of functionality.
1）来自 Facebook 等网站的扩展应用程序编程接口（APIs），将允许此类功能的实现。

2) Mashups - combining separate systems to provide more intelligent help. (Google maps suggests restaurants based upon yours and others recommendations elsewhere and suggests a good day to visit based upon your calendar).
2) 混搭应用（Mashups）——将不同系统整合以提供更智能的帮助。（谷歌地图会根据你和其他人在其他平台的推荐建议餐厅，并根据你的日历推荐最佳到访日期）。

3) The semantic web - web pages are encoded with data (invisible to the user) to add this functionality. This can be done with a (formal) ontology system or an informal folksonomy (tagging by users). Currently available ontology systems are ‘rdfa’ and ‘microformat’.
3）语义网——网页通过编码数据（对用户不可见）来实现这一功能。这可以通过（正式的）本体系统或非正式的群众分类法（由用户标记）完成。目前可用的本体系统包括‘rdfa’和‘microformat’。

C.2 Searching the Web C.2 网络搜索

C.2.1 Define the term search engine
C.2.1 定义术语 search engine

A search engine is a program that allows a user to search for information normally on the web.
搜索引擎是一种允许用户通常在网络上搜索信息的程序。

A search engine is accessed through a browser on the user’s computer.
搜索引擎通过用户计算机上的浏览器进行访问。
The list of contents/results returned to the user is known as search engine results page (SERP)
返回给用户的内容/结果列表被称为搜索引擎结果页面（SERP）

C.2.2 Distinguish between the surface web and the deep web
C.2.2 区分表层网络和深层网络

Surface Web 表层网络

The surface web is the part of the web that can be reached by a search engine. For this, pages need to be static and fixed, so that they can be reached through links from other sites on the surface web. They also need to be accessible without special configuration. Examples include Google, Facebook, Youtube, etc.
表层网络是搜索引擎能够访问的网络部分。为此，页面需要是静态且固定的，以便通过表层网络中其他站点的链接进行访问。它们还必须无需特殊配置即可访问。示例包括 Google、Facebook、Youtube 等。

Pages that are reachable (and indexed) by a search engine
能够被搜索引擎访问（并被索引）的页面

Pages that can be reached through links from other sites in the surface web
表层网络中其他站点链接可抵达的页面

Pages that do not require special access configurations
无需特殊访问配置的页面

Deep web 深网

The deep web is the part of the web that is not searchable by normal search engines. Reasons for this include proprietary content that requires authentication or VPN access, e.g. private social media, emails; commercial content that is protected by paywalls( paid subscription) , e.g. online news papers, academic research databases; personal information that is protected, e.g. bank information, health records; dynamic content. Dynamic content is usually a result of some query, where data are fetched from a database.
深网是普通搜索引擎无法搜索到的那部分网络。其原因包括需要身份验证或 VPN 访问的专有内容（如私人社交媒体、电子邮件）；受付费墙（付费订阅）保护的商业内容（如在线新闻报纸、学术研究数据库）；受保护的个人信息（如银行信息、健康记录）；以及动态内容。动态内容通常是某些查询的结果，数据从数据库中提取而来。

Pages not reachable by search engines
搜索引擎无法访问的页面

Substantially larger than the surface web
远大于表层网络

Common characteristics: 共同特征：

Dynamically generated pages, e.g. through queries, JavaScript, AJAX, Flash
动态生成的页面，例如通过查询、JavaScript、AJAX、Flash 生成
Password protected pages, e.g. emails, private social media
受密码保护的页面，例如电子邮件、私人社交媒体
- Paywalls, e.g. online news papers, academic research databases
  付费墙，例如在线新闻报纸、学术研究数据库
- personal information, e.g. health records
  个人信息，例如健康记录
Pages without any incoming links
没有任何传入链接的页面

C.2.3 Outline the principles of searching algorithms used by search engines
C.2.3 概述搜索引擎使用的搜索算法原理

The most known search algorithms are PageRank and the HITS algorithm, but it is important to know that most search engines include various other factors as well, e.g.
最知名的搜索算法是 PageRank 和 HITS 算法，但需要注意的是大多数搜索引擎还包含各种其他因素，例如。

the time that a page has existed
页面存在的时间

the frequency of the search keywords on the page
页面上搜索关键词的频率

other unknown factors (undisclosed)
其他未知因素（undisclosed）

For the following description the terms “inlinks” and “outlinks” are used. Inlinks are links that point to the page in question, i.e. if page W has an inlink, there is a page Z containing the URL of page W. Outlinks are links that point to a different page than the one in question, i.e. if page W has an outlink, it is an URL of another page, e.g. page Z.
对于以下描述，使用了术语"inlinks"和"outlinks"。入链是指向所讨论页面的链接，即如果页面 W 有一个入链，则存在一个页面 Z 包含页面 W 的 URL。出链指向与当前页面不同的其他页面，即如果页面 W 有一个出链，则该链接是其他页面（例如页面 Z）的 URL。

PageRank algorithm PageRank 算法

PageRank works by counting the number and quality of inlinks of a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
PageRank 通过统计网页入站链接的数量和质量来粗略评估该网站的重要性。其基本假设是，越重要的网站可能会从其他网站获得更多链接。

As mentioned it is important to note that there are many other factors considered. For instance, the anchor text of a link is often far more important than its PageRank score.
如前所述，必须指出的是还有许多其他因素需要考虑。例如，链接的锚文本通常比其 PageRank 得分重要得多。

Pages are given a score (rank)
页面会被赋予一个分数（排名），该分数基于其他页面链接到它们的数量（入链数量）
Rank determines the order in which pages appear
排名决定了页面显示的顺序
Incoming links add value to a page
入站链接为页面增添价值
The importance of an inlink depends on the PageRank (score) of the linking page/Page Authotrity
反向链接的重要性取决于链接页面的 PageRank（分数）/Page Authotrity
PageRank counts links per page and determines which page are most important
PageRank 会统计每个页面的链接数量，进而确定哪些页面最为重要
Links from site that are relevant carry more weight than links from non related sites.
来自相关网站的链接比不相关网站的链接更具分量。

HITS algorithm HITS 算法

Based on the idea that keywords are not everything that matters; there are sites that might be more relevant even if they don’t contain the most keywords. It introduces the idea of different types of pages, authorities and hubs.
基于关键词并非决定一切的理念；有些网站即使不包含最多的关键词，也可能更具相关性。它引入了不同类型页面的概念，即 authorities 和 hubs。

Authorities: A page is called an authority, if it contains valuable information and if it is truly relevant for the search query. It is assumed that such a page has a high number of in-links.
权威页面：如果一个页面包含有价值的信息，并且与搜索查询真正相关，则称其为权威页面。假设此类页面具有大量的入链。

Hubs: These are pages that are relevant for finding authorities. They contain useful links towards them. It is therefore assumed that these pages have a high number of out-links.
枢纽页面（Hubs）：这类页面与发现权威页面相关，其包含指向权威页面的有效链接。因此，这些页面被认为拥有大量出站链接。

The algorithm is based on mathematical graph theory, where a page is represented by a vertex and links between pages are represented by edges (in form of vectors).
该算法基于数学图论，其中页面由顶点表示，页面间的链接由边（以向量形式表示）表示。

Attempts to computationally determine hubs and authorities on a particular topic through analysis of a relevant subgraph of the web. Based on mutually recursive facts: Hubs point to lots of authorities. Authorities are pointed to by lots of hubs.
通过分析网络的相关子图，以计算方式确定特定主题的枢纽页面和权威页面的尝试。基于相互递归的事实：枢纽页面指向大量权威页面，权威页面被众多枢纽页面所指向。（HITS 算法）

C.2.4 Describe how a web-crawler functions
C.2.4 描述网络爬虫的工作原理

A web crawler, also known as a web spider, web robot or simply bot, is a program that browses the web in a methodical and automated manner. For each page it finds, a copy is downloaded and indexed. In this process it extracts all links from the given page and then repeats the same process for all found links. This way, it tries to find as many pages as possible.
网络爬虫，也被称为网络蜘蛛、网络机器人或简称机器人，是一种以系统化和自动化方式浏览网络的程序。对于它找到的每个页面，都会下载并索引一个副本。在此过程中，它会从给定页面提取所有链接，然后对所有找到的链接重复相同的过程。通过这种方式，它试图找到尽可能多的页面。

Limitations: 《选项 C 网络科学 - COMPUTER SCIENCE Middle & High School》

They might look at meta data contained in the head of web pages, but this depends on the crawler
他们可能会查看包含在网页头部的元数据，但这取决于 crawler
A crawler might not be able to read pages with dynamic content as they are very simple programs
爬虫可能无法读取包含动态内容的页面，因为它们是结构非常简单的程序

Robots.txt

Stop Bots using Band With
使用带宽阻止机器人

Save Band width less time on site crawling
节省带宽，减少网站爬取时间

Issue: A crawler consumes resources and a page might not wish to be “crawled”. For this reason “robots.txt” files were created, where a page states what should be indexed and what shouldn’t.
问题：爬虫程序会消耗资源，且某个页面可能不希望被“爬取”。为此人们创建了“robots.txt”文件，页面通过该文件声明哪些内容应被索引、哪些不应被索引。

A file that contains components to specify pages on a website that must not be crawled by search engine bots
一个包含指定网站中禁止被搜索引擎机器人抓取页面的组成部分的文件
File is placed in root directory of the site
文件被放置在网站的根目录中
The standard for robots.txt is called “Robots Exclusion Protocol”
robots.txt 的标准被称为“Robots Exclusion Protocol”
Can be specific to a special web crawler, or apply to all crawlers
可以针对特定的专用网络爬虫，或适用于所有爬虫程序
Not all bots follow this standard (malicious bots, malware) -> “illegal” bots can ignore robots.txt
并非所有网络爬虫都遵循此标准（恶意爬虫、恶意软件）->"非法"爬虫可无视 robots.txt 文件
Still considered to be better to include a robots.txt instead of leaving it out
仍建议包含 robots.txt 文件，而非直接省略
It keeps the bots from less “noteworthy” content of a website
它能阻止爬虫访问网站中不太“值得关注”的内容
more time spend on indexing important/relevant content of the website
将更多时间用于索引网站的重要/相关内容

C.2.5 Discuss the relationship between data in a meta tag and how it is accessed by a web-crawler
C.2.5 讨论元标签中的数据与网络爬虫如何访问这些数据之间的关系

Students should be aware that this is not always a transitive relationship.
学生应该意识到这并不总是一个传递关系。

Meta Keywords Attribute - A series of keywords you deem relevant to the page in question.
Meta Keywords Attribute - 一系列你认为与当前页面相关的关键词。

Title Tag - This is the text you'll see at the top of your browser. Search engines view this text as the "title" of your page.
Title Tag - 你将在浏览器顶部看到的文本。搜索引擎将此文本视为你页面的"标题"。

Meta Description Attribute - A brief description of the page.
Meta Description Attribute - 页面的简要描述。

Meta Robots Attribute - An indication to search engine crawlers (robots or "bots") as to what they should do with the page.
Meta Robots Attribute - 向搜索引擎爬虫（robots 或 "bots"）发出的关于应如何处理页面的指示。

In the past the meta keyword tag could be spammed full of keywords sometimes not even relevant to the content on the page. This tag is mostly ignored by search engines. The met description can sometimes be show in the results, but is not a factor in actual ranking.
过去 meta 关键词标签常被滥用堆砌大量关键词，有时甚至与页面内容无关。这一标签现在基本已被搜索引擎所忽略。meta 描述标签有时会显示在搜索结果中，但并不是实际排名的影响因素。

Robotics Tag

Robotics Tag : This is super important and can be sued to disallow crawlers from crawling the page, you can specify all crawlers or list the ones that you do not wish to be crawled by.
Robotics Tag ：该标签极为重要，可用于禁止爬虫抓取页面，您可指定所有爬虫或列出不希望被其抓取的特定爬虫名单。

Answer depends on different crawlers, but generally speaking:
答案因不同爬虫程序而异，但总体来说：

The title tag, not strictly a meta-tag, is what is shown in the results, through the indexer
标题标签（严格来说并非元标签）是通过索引器显示在搜索结果中的内容
The description meta-tag provides the indexer with a short description of the page and this can also be displayed in the SERPS
description 元标签为索引器提供页面的简短描述，该描述也可能显示在 SERPS（搜索引擎结果页）中
The keywords meta-tag provides…well keywords about your page
keywords 元标签提供了……嗯，关于你页面的关键词

C.2.6 Discuss the use of parallel web-crawling
C.2.6 讨论并行网络爬虫技术的应用

Size of the web grows, increasing the time it would take to download pages
网络规模的扩大增加了下载页面所需的时间

To make this reasonable “it becomes imperative to parallelize the crawling process (Stanford)
为了使这一点合理，“必须并行化爬取过程”（Stanford）

Advantages 优点

Scalability: as the web grows a single process can not handle everything Multithreaded processing can solve the problem
可扩展性（Scalability）：随着网络的发展，单个进程无法处理所有任务。多线程处理（Multithreaded processing）可以解决这个问题

Network load dispersion: as the web is geographically dispersed, dispersing crawlers disperses the network load
网络负载分散：由于网络在地理上是分散的，分散爬虫可以分散网络负载

Network load reduction ( scalability, efficiency and throughput )
网络负载减少（可扩展性、效率和吞吐量）

Issues of parallel web crawling
并行网络爬虫的问题

Overlapping: parallel web crawlers might index the same page multiple times
重叠问题：并行网络爬虫可能多次索引同一页面

Quality: If a crawler wants to download ‘important’ pages first, this might not work in a parallel process
质量：如果网络爬虫想要优先下载'重要'页面，这在并行进程中可能无法实现

Communication bandwidth: parallel crawlers need to communicate for the former reasons, which for many processes might take significant communication bandwidth . Why search engines take the quality approach click here
通信带宽：并行爬虫出于前述原因需要进行通信，对于许多进程来说，这可能会占用大量的通信带宽。为什么搜索引擎采取质量策略 click here

If parallel crawlers request the same page frequently over a short time it will overload servers
如果并行爬虫程序在短时间内频繁请求同一页面，就会使服务器超载

Discuss the use of parallel web crawling
讨论并行网络爬取的使用

A crawler is a program that downloads and stores Web pages, often for a Web search engine. Roughly, a crawler starts off by placing an initial set of URLs, So, in a queue, where all URLs to be retrieved are kept and prioritized. From this queue, the crawler gets a URL (in some order), downloads the page, extracts any URLs in the downloaded page, and puts the new URLs in the queue. This process is repeated until the crawler decides to stop. Collected pages are later used for other applications, such as a Web search engine or a Web cache. As the size of the Web grows, it becomes more difficult to retrieve the whole or a significant portion of the Web using a single process. Therefore, many search engines often run multiple processes in parallel to perform the above task, so that download rate is maximized (reference http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.8408&rep=rep1&type=pdf )
爬虫是一种下载并存储网页的程序，通常用于网络搜索引擎。简而言之，爬虫首先将一组初始 URL 放入队列中（队列用于存放所有待检索的 URL 并按优先级排序）。爬虫从队列中获取某个 URL（按特定顺序），下载对应页面，提取页面中的所有 URL，并将新发现的 URL 加入队列。该过程循环执行直至爬虫停止工作。收集的页面后续可用于其他应用场景，如构建网络搜索引擎或网页缓存。随着网络规模扩大，单一进程已难以抓取整个网络或其主要部分。因此，多数搜索引擎会并行运行多个进程以最大化下载速率（参考文献 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.8408&rep=rep1&type=pdf ）。

Why search engines take the quality approach ( dated )
为何搜索引擎采取质量优先策略（已过时）

According to a study released in October 2000, the directly accessible "surface web" consists of about 2.5 billion pages, while the "deep web" (dynamically generated web pages) consists of about 550 billion pages, 95% of which are publicly accessible [LVDSS00].
根据 2000 年 10 月发布的一项研究，可直接访问的"表面网络"（surface web）包含约 25 亿个页面，而"深层网络"（deep web，即动态生成的网页）包含约 5500 亿个页面，其中 95%是公开可访问的[LVDSS00]。

By comparison, the Google index released in June 2000 contained 560 million full-text-indexed pages [Goo00]. In other words, Google — which, according to a recent measurement [HHMN00], has the greatest coverage of all search engines — covers only about 0.1% of the publicly accessible web, and the other major search engines do even worse.
相比之下，Google 在 2000 年 6 月发布的索引包含了 5.6 亿个全文索引页面[Goo00]。换句话说，根据最新测量数据显示[HHMN00]，覆盖范围最广的搜索引擎 Google 仅覆盖了公开可访问网络中约 0.1%的内容，而其他主要搜索引擎的表现甚至更差。

Increasing the coverage of existing search engines by three orders of magnitude would pose a number of technical challenges, both with respect to their ability to discover, download, and index web pages, as well as their ability to serve queries against an index of that size. (For query engines based on inverted lists, the cost of serving a query is linear to the size of the index.) Therefore, search engines should attempt to download the best pages and include (only) them in their index.
将现有搜索引擎的覆盖范围扩大三个数量级将带来一系列技术挑战，既涉及发现、下载和索引网页的能力，也涉及针对该规模索引提供查询服务的能力。（对于基于倒排列表的查询引擎而言，处理查询的成本与索引规模呈线性关系。）因此，搜索引擎应优先下载优质网页并仅将其纳入索引。

Mercator is an extensible, multithreaded, high-performance web crawler [HN99, Mer00]. It is written in Java and is highly configurable. Its default download strategy is to perform a breadth-first search of the web, with the following three modifications:
Mercator 是一款可扩展、多线程、高性能的网络爬虫[ HN99, Mer00]。它采用 Java 语言编写且具备高度可配置性。其默认下载策略是对网络进行广度优先搜索，并进行了以下三项修改：

It downloads multiple pages (typically 500) in parallel. This modification allows us to download about 10 million pages a day; without it, we would download well under 100,000 pages per day.
它以并行方式下载多个页面（通常为 500 个）。这一改进使我们每天能够下载约 1000 万个页面；若没有这一改进，我们每天的下载量将远低于 10 万页面。

Only a single HTTP connection is opened to any given web server at any given time. This modification is necessary due to the prevalence of relative URLs on the web (about 80% of the links on an average web page refer to the same host), which leads to a high degree of host locality in the crawler's download queue. If we were to download many pages from the same host in parallel, we would overload or even crash that web server.
在任何给定时间，仅向任何特定 Web 服务器开启单一 HTTP 连接。由于网络中相对 URL 的普遍存在（平均网页约 80%的链接指向同一主机），该调整是必要的，这会导致爬虫下载队列中高度集中的主机本地性。若我们并行下载同一主机的多个页面，将会使该 Web 服务器过载甚至崩溃。

If it took t seconds to download a document from a given web server, then Mercator will wait for 10t seconds before contacting that web server again. This modification is not strictly necessary, but it further eases the load our crawler places on individual servers on the web. We found that this policy reduces the rate of complaints we receive while crawling.
如果从某个网络服务器下载文档耗时 t 秒，那么 Mercator 将等待 10t 秒后才会再次联系该服务器。这项修改并非绝对必要，但它能进一步减轻我们的爬虫对单个网络服务器造成的负载。我们发现该策略有效降低了我们在抓取过程中收到的投诉率。

Further Reading Click Here
延伸阅读点击此处

C.2.7 Outline the purpose of web-indexing in search engines
C.2.7 概述网络索引在搜索引擎中的目的。

Search engines index websites in order to respond to search queries with relevant information as quick as possible. For this reason, it stores information about indexed web pages, e.g. keyword, title or descriptions, in its database. This way search engines can quickly identify pages relevant to a search query.
搜索引擎对网站进行索引，以便尽可能快速地用相关信息响应搜索查询。为此，搜索引擎会在其数据库中存储已索引网页的相关信息，例如关键词、标题或描述。通过这种方式，搜索引擎能够快速识别与搜索查询相关的页面。

Indexing has the additional purpose of giving a page a certain weight, as described in the search algorithms. This way search results can be ranked, after being indexed.
索引的另一个目的是根据搜索算法中的描述，赋予页面一定的权重。这样一来，搜索结果在被索引后即可进行排名。

C.2.8 Suggest how developers can create pages that appear more prominently in search engine results. Describe the different metrics used by search engines
C.2.8 建议开发者如何创建在搜索引擎结果中更显眼的页面。描述搜索引擎使用的不同指标。

Naturally an overlap exists with what the web site developer should do to get the site high in the serps ( search engine results page)
自然，网站开发者为使网站在 serps（搜索引擎结果页面）中排名靠前所需采取的措施存在一定的重叠

On Page 页面上的

Relevancy does your site provide the information the user is searching for. The user experience (UX) is becoming a big part as this can not be manipulated and in the future will play a much bigger role. User Experience ( time user stays on site / bounce rate ). Many factors play a role in the user experience. Load Speed, Easy Navigation ( no broken links ), Spelling, quality and factually correct content, Structured layout 5 Use of images/video 5 page design colors, images video infographics and formatting so it is easy to scan the page for relevant information. The idea is to get the user to stay on your site ( sticky ) . If a use lands on your site after do a search and leaves after a few seconds sends or even before the page loads ( slow loding ) the is a very BIG signal to google that they should not have served up that result.
相关性指你的网站是否提供了用户搜索的信息。用户体验(UX)正成为重要因素，因其无法被操纵且未来将发挥更大作用。用户体验指标（用户停留时间/跳出率）。影响用户体验的多个因素包括：加载速度、易用导航（无失效链接）、拼写正确、内容优质真实、结构化布局、图片/视频应用、页面设计（色彩搭配、图片视频信息图及排版格式）便于快速浏览关键信息。核心目标是提高用户粘性。若用户通过搜索结果进入网站后快速离开（甚至在页面加载完成前因速度过慢而关闭），这向谷歌传递了强烈信号——该结果不应被展示。

Off Page

Back Links from other web site the more authoritative the site the better ( example huffington post ), The site that links to your site should also be relevant. Example if your selling dog insurance a site from a respected charitable dog web site would be a very big boost. A link from a site that provides car rental would have little impact as totally irrelevant.
来自其他网站的反向链接，网站权威性越高越好（例如《赫芬顿邮报》）。链接到您网站的网站也应具有相关性。例如，如果您销售宠物狗保险，来自权威慈善狗狗网站的链接将产生巨大提升；而来自租车服务网站的链接则几乎无效，因为完全无关。

Social media marketing FACE Book etc. Be a leader in your field and comment on relevant authoritative forums or blogs. Others users sharing via social bookmarking sites.
社交媒体营销 FACE Book 等。成为所在领域的领导者并在相关权威论坛或博客上发表评论。其他用户通过社交书签网站进行分享。

This an area in which you can manipulate the search results. If Google discovers this your web site will be dropped from index. So need to ensure any links are natural linking to an authoritative article info graphic on your web site.
这是一个你可以操纵搜索结果的区域。如果 Google 发现这一点，你的网站将被移出索引。因此需要确保所有链接都是自然链接到你网站上的权威文章信息图表。

C.2.9 Describe the different metrics used by search engines.
C.2.9 描述搜索引擎使用的不同指标。

The process of making pages appear more prominently in search engine results is called SEO. There are many different techniques, considered in section C.2.11. This field is a big aspect of web marketing, as search engines do not disclose how exactly they work, making it hard for developers to perfectly optimise pages.
让页面在搜索引擎结果中显得更突出的过程被称为 SEO。C.2.11 章节探讨了许多不同的技术。这个领域是网络营销的重要方面，由于搜索引擎不公开其具体运作原理，使得开发人员难以完美优化页面。

In order to improve the ranking of a web site Google uses many many metrics below is a few of the important ones.
为了提升网站在搜索结果中的排名，Google 会使用众多评估指标，以下是其中几个重要的考量因素。

Top Metrics 关键指标

On Page 页面上的

Make sure your site can be crawled and thus indexed avoid flash and provide a sitemap and good web site architecture
确保你的网站能被爬取并因此被索引，避免使用 Flash，提供站点地图和良好的网站架构

The Title Create a title tag with your key phrase near or at the beginning. The title should be crafted to get the user to click on your web site when displayed in the search results. The title must reflect the content of yourr site
标题在标题标签的开头或附近加入关键词短语。标题应精心设计，以便在搜索结果中显示时吸引用户点击您的网站。标题必须准确反映您网站的内容

Content will always be important it must be high quality and any information must be factual at least 1000 words for home page
内容始终至关重要，必须确保内容质量上乘，所提供的信息必须基于事实，主页内容至少需要 1000 字

Freshness of content 内容新鲜度

Mobile Friendly 移动友好

Page load speed under 3 seconds
页面加载速度低于 3 秒

If link broker browser will return a HTTP response code 404. This should be detected by web designer and provide a help page with user navigation.
如果链接代理浏览器返回 HTTP 响应代码 404，网页设计师应检测到这一点并提供带有用户导航的帮助页面。

Text Formatting (use of h1,h2,bold etc)
文本格式设置（使用 h1、h2、bold 等）

HTTPS

Do Keyword Research to find what users actually search for and build pages for these terms
进行关键词研究以了解用户实际搜索的内容，并为这些关键词创建页面

These are only a fraction that google will use, more recently they have given a very slight increase for sites that are HTTPS
这些只是谷歌会使用的部分因素，最近他们对采用 HTTPS 的网站给予了极小幅度的排名提升

C.2.10 Explain why the effectiveness of a search engine is determined by the assumptions made when developing it.
C.2.10 解释为什么搜索引擎的有效性取决于开发时所做出的假设。

The search engine must serve up results that are relevant to what the users search for, Google used page rank , prior to that search engines just used title tags key word tags. These could be easily manipulated ( stuffed with keywords that you wish to rank for ) to get your site on page 1. Google devised a Page Rank checking algorithm which played a big part of their search algorithm.
搜索引擎必须提供与用户搜索内容相关的结果，谷歌采用了 Page Rank 算法。在此之前，搜索引擎仅使用标题标签和关键词标签。这些标签很容易被操纵（通过堆砌您希望排名的关键词）以使您的网站登上搜索结果首页。谷歌设计了一种名为 Page Rank 的检查算法，该算法在其搜索算法中发挥了重要作用。

Avoid Indexing Spam sites ( duplicate content copied ) . Detect sites that use Black Hat and remove from index
避免索引垃圾网站（重复复制内容）。检测使用 Black Hat 技术的网站并从索引中移除

Don't process static sites ( that do not change ) / crawl more frequently authoritative/changing ( fresh content ) news sites
不处理静态网站（不更新的）/更频繁地爬取权威/更新频繁（含新鲜内容）的新闻网站

Respect Robots text files
遵守 Robots 文本文件

Determine sites that change on a regular basis and cache these.
确定定期更新的站点并对其进行缓存。

The spider should not overload servers by continually hitting the same site.
爬虫不应通过持续不断地访问同一网站而使服务器过载。

The algorithm must be able to avoid spider traps
该算法必须能够避免 spider traps

Ignore paid for links ( can be difficult )
忽略付费链接（可能很难）

Ignore exact match anchor text if its being used to rank keywords / search terms ( backlink profile should look natural ) o the search engine
如果完全匹配锚文本被用于排名关键词/搜索词（反向链接档案应显得自然），则忽略之 o the search engine

Use comments box to add more or question why these.
使用评论框添加更多内容或询问原因。

C.2.11 Discuss the use of white hat and black hat search engine optimization.
C.2.11 讨论白帽（white hat）和黑帽（black hat）搜索引擎优化的应用。

BLACK HAT 黑帽

Definition: Black hat SEO is a technique, in simple words, to get the top positions or higher rankings in the major search engines like Google, Yahoo and Bing that breaks the rule and regulations of search engine’s guidelines. See example of guidelines for google click here.
定义：黑帽 SEO（Black hat SEO）简而言之是一种通过违反搜索引擎指南规则（如 Google 指南示例请点击 here 查看）来在 Google、Yahoo、Bing 等主流搜索引擎中获取高位排名或更高排名的技术手段。

Keyword stuffing 关键词堆砌

This worked at one time, now you still need the key words / search terms in your title and page content you need to ensure that you do not overuse the keywords / phrases as that will trip a search engine filter.
这在过去一度有效，如今你仍需要在标题和页面内容中使用关键词/搜索术语，但需确保不过度使用关键词/短语，否则会触发搜索引擎过滤器。

PBN

Google ( currently ) favors older sites, sites with history. In this approach you buy an expired domain with good metrics , build it up and add links to your sites giving a boost in ranking. This works, but it is costly to set set up and you need to use alias etc.
Google（目前）更青睐拥有历史的老站点。这种策略是购买一个指标良好的过期域名，进行建设并添加指向自己网站的链接，从而提升排名。该方法有效，但设置成本高昂且需使用 alias 等工具。

Paid For Links 付费链接

Similar to PBN the aim is to get good quality links from high authority sites. Have a look at Fiverr where yo can buy such links. This is difficult for google to detect and it is also very effective.
与 PBN 类似，其目标是从高权威网站获取优质链接。可以查看 Fiverr 平台购买此类链接。这种方法对谷歌来说难以检测，同时也非常有效。

Syndicated / Copied Content
聚合/复制内容

Rather than creating good quality content use content copied from other sites, the content may be changed using automated techniques. Google is much better at detecting please refer to PANDA Update
不要自己创作优质内容，而是使用从其他网站复制的内容，这些内容可能会通过自动化技术进行修改。谷歌在检测此类行为方面已经更加高效，具体请参考 PANDA Update。

Over Use of Key Words in Anchor Text
锚文本中关键词的过度使用

The anchor text tells google what your site is about example "fleet insurance" , but if you overuse or your backlinks look unnatural you will be penalized please refer to Penguin. Before Penguin this was very effective in getting ranked
锚文本会告诉谷歌您的网站内容主题例如"fleet insurance"，但如果过度使用或您的反向链接看起来不自然，您将会被处罚，请参考 Penguin 算法。在 Penguin 算法推出前，这种策略对提升排名非常有效。

Web 2.0 Links Web 2.0 链接

Build a web site on Tumbler for example for the sole purpose of sending links to your money site
例如，在 Tumbler 上建立一个网站，其唯一目的是向您的盈利网站发送链接

WHITE HAT 白帽

Guest Blogging 客座博客

The process of writing a blog post for someone else’s blog is called guest blogging
为他人博客撰写博文的过程被称为 guest blogging

Link Baiting 链接诱饵

Create an amazing article info graphic that other sites may use, if you include a link to your site in the article you get more back inks as a result ( natural acquisition of back links as opposed to paid )
创建一篇令人惊叹的文章信息图，供其他网站使用，如果在文章中包含指向您网站的链接，您将因此获得更多反向链接（自然获取反向链接，而非付费方式）

Quality Content 优质内容

Search engines evaluate the content of a web page, thus a web page might get higher ranking with more information. This will make it more valuable on the index and other web pages might link to your web page if it has a high standard in content.
搜索引擎会评估网页的内容，因此网页包含的信息越多，其排名可能越高。这将使其在索引中更有价值，如果内容质量高，其他网页可能会链接到你的网页。

Site optimization Design
网站优化设计

Good menu navigation. Proper use of title tags and header tags, adding images with keyword alt tags, interlinking again with keyword anchor text. Create a, sitemap to get site crawled plus inform the spiders how often to visit site.
良好的菜单导航。正确使用标题标签和标题标签，添加带有关键词替代文本的图像，再次使用关键词锚文本进行内部链接。创建站点地图以便让网站被抓取，并告知爬虫访问网站的频率。

A good User Experience (UX)
良好的用户体验（UX）

This a broad term and overlaps some other areas mentioned example page load speed. The purpose to ensure that if a use click to go to your site they stay without clicking back to the serps immediately. Google is happy as this a quality signal as its main purpose to provide the user with relevant results.
这是一个宽泛的术语，与提到的其他领域（例如页面加载速度）存在部分重叠。其目的是确保当用户点击进入你的网站后，他们会停留下来而不会立即点击返回搜索结果页面（SERPs）。谷歌对此感到满意，因为这是一个质量信号，其主要目的是为用户提供相关结果。

Page Site Load Speed 页面加载速度

Fast loading pages gives the user a good experience aim for under 3 seconds
快速加载的页面能为用户带来良好体验，目标是在 3 秒以内完成加载

Freshness 新鲜度

Provide fresh content on a regular basis.
定期提供新鲜内容。

Google is continually ( as are other search engines ) fighting black hat techniques that web masters employ to rank high in the serps. Investigate these 2 major algorithm updates Panda and Penguin. A good example of a current black hat practice is the use of PBN's.
谷歌（其他搜索引擎也是如此）持续打击网站管理员用来在 SERPs 中获取高排名的黑帽技术。研究这两大算法更新——Panda 和 Penguin。当前黑帽手段的一个典型例子是使用 PBN's。

Students to investigate PBN's Panda and Penguin Quick discussion on these and what Google was targeting and how PBN's are currently being used effectively to rank sites higher ( if caught you will wake up one morning and you web site(s) have been de-indexed from Google.
让学生调查 PBN 的熊猫(Panda)和企鹅(Penguin)算法，快速讨论这些算法以及谷歌针对的目标对象，探讨当前如何有效利用 PBN 提升网站排名（若被发现，某天早晨醒来你会发现你的网站已被谷歌除名）。

C.2.12 future challenges to search engines as the web continues to grow
C.2.12 随着网络持续发展搜索引擎将面临的未来挑战

Search engines must be fast enough to crawl the exploding volume of new Web pages in order to provide the most up-to-date information. As the number of pages on the Web grows, so will the number of results search engines return. So, it will be increasingly important for search engines to present results in a way that makes it quick and easy for users to find exactly the information they’re looking for. Search engines have to overcome both these challenges.
搜索引擎必须足够快速地爬取激增的新网页数量，以提供最新的信息。随着网络上网页数量的增长，搜索引擎返回的结果数量也会增加。因此，搜索引擎以快速、简便的方式呈现结果，让用户准确找到所需信息将变得越来越重要。搜索引擎必须克服这两大挑战。

Improvements in Search interface example Voice Search
搜索界面的改进示例 Voice Search
Use of natural language process will also become more prevalent. Today, the search engine takes a set of keywords as the input and returns a list of rank-sorted links as the output. This will slowly fade and the new search framework will have questions as the input and answers as the output. The nascent form of this new framework is already available in search engines like Google and Bing.
自然语言处理技术的应用也将变得更加普遍。如今，搜索引擎以一组关键词作为输入，返回按排名排序的链接列表作为输出。这种方式将逐渐消失，新的搜索框架将以问题作为输入，答案作为输出。这种新框架的雏形已在 Google 和 Bing 等搜索引擎中初现端倪。
semantic searching by machine learning see Rank Brain. RankBrain is designed to help better interpret those queries and effectively translate them, behind the scenes in a way, to find the best pages for the searcher
基于机器学习的语义搜索参见 RankBrain。RankBrain 旨在帮助更好地解读这些查询，并在幕后有效地将其转换，从而为搜索者找到最佳页面。
Personalized Search Because mobile is becoming the primary form of consumption, future search engines will try to use powerful sensing technologies like accelerometer, digital compass, gyroscope and GPS. Google recently bought a company called Behavio which predicts what a user might do next by using the information acquired from the different sensors on the user’s phone.
个性化搜索由于移动设备正逐渐成为主要的消费形式，未来的搜索引擎将尝试使用加速计、数字罗盘、陀螺仪和 GPS 等强大的传感技术。谷歌最近收购了一家名为 Behavio 的公司，该公司通过使用用户手机上的不同传感器获取的信息来预测用户下一步可能做什么。

C.3 Distributed approaches to the web
C.3 网络的分布式方法

Link to Peer-to-peer Slides and ubiquitous Computing
链接至 Peer-to-peer 幻灯片与 ubiquitous Computing

Link to Grid Computing Slides
网格计算幻灯片链接

C.3.1 Define the terms: mobile computing, ubiquitous computing, peer-2-peer network, grid computing
C.3.1 定义以下术语：移动计算（mobile computing）、普适计算（ubiquitous computing）、点对点网络（peer-2-peer network）、网格计算（grid computing）

What is Grid Computing? 什么是网格计算？

the grid that will enable the public to exploit data storage and computer power over the Internet analogous to the electric power utility (a ubiquitous commodity).
这种网格将使公众能够通过互联网利用数据存储和计算能力，类似于电力设施（一种无处不在的商品）。

A connected system of computer and communication nodes to provide an abstract high-performance computing/storage resource
一种由计算机和通信节点组成的互联系统，旨在提供抽象的高性能计算/存储资源
A style of computing that dynamically pools IT resources together for use based on resource need. It allows organizations to provision and scale resources as needs arise, thereby preventing the under utilization of resources
一种根据资源需求动态整合 IT 资源以供使用的计算模式。它使组织能够按需调配和扩展资源，从而防止资源未充分利用。

What are The Key Resources that we can share on a grid network of Computers ?
在计算机的网格网络中，我们可以共享哪些关键资源？

1
Central processing unit (CPU): A CPU is a microprocessor that performs mathematical operations and directs data to different memory locations. Computers can have more than one CPU.
2
Memory: In general, a computer's memory is a kind of temporary electronic storage. Memory keeps relevant data close at hand for the microprocessor. Without memory, the microprocessor would have to search and retrieve data from a more permanent storage device such as a hard disk drive.
内存：通常来说，计算机内存是一种临时性的电子存储设备。内存将相关数据保存在微处理器（microprocessor）附近，便于快速访问。如果没有内存，微处理器将不得不从更永久性的存储设备（如硬盘驱动器 hard disk drive）中搜索和检索数据。
3
Storage: In grid computing terms, storage refers to permanent data storage devices like hard disk drives or databases.
存储：在网格计算术语中，存储指的是硬盘驱动器（hard disk drives）或数据库（databases）等永久数据存储设备。

Normally, a computer can only operate within the limitations of its own resources. There's an upper limit to how fast it can complete an operation or how much information it can store. Most computers are upgradeable, which means it's possible to add more power or capacity to a single computer, but that's still just an incremental increase in performance.
通常，计算机只能在自身资源的限制范围内运行。其完成操作的速度或存储信息的量都存在上限。大多数计算机都可以升级，这意味着可以为单台计算机增加更多性能或容量，但这仍然只是性能的渐进式提升。

Grid computing systems link computer resources together in a way that lets someone use one computer to access and leverage the collected power of all the computers in the system. To the individual user, it's as if the user's computer has transformed into a supercomputer.
网格计算系统以某种方式将计算机资源链接在一起，使得用户能够通过一台计算机访问并利用系统中所有计算机的集体计算能力。对于个人用户而言，这就像是他们的计算机已经转变为一台超级计算机。

Grid Computing Lexicon 网格计算术语表

Interoperability: The ability for software to operate within completely different environments. For example, a computer network might include both PCs and Macintosh computers. Without interoperable software, these computers wouldn't be able to work together because of their different operating systems and architecture.
互操作性：软件在不同环境中运行的能力。例如，计算机网络可能同时包含 PC 和 Macintosh 计算机。如果没有互操作性软件，这些计算机将无法协作，因为它们使用不同的操作系统和架构。

Open standards: A technique of creating publically available standards. Unlike proprietary standards, which can belong exclusively to a single entity, anyone can adopt and use an open standard. Applications based on the same open standards are easier to integrate than ones built on different proprietary standards.
开放标准：一种创建公开可用标准的技术。与专有标准（可专属于单一实体）不同，任何人都可以采用和使用开放标准。基于相同开放标准的应用程序比基于不同专有标准的应用程序更易于集成。

At least one computer, usually a server, which handles all the administrative duties for the system. Many people refer to this kind of computer as a control node. Other application and Web servers (both physical and virtual) provide specific services to the system.text here...
至少有一台计算机（通常为服务器）负责处理系统的所有管理职责。许多人将这类计算机称为控制节点。其他应用服务器和 Web 服务器（包括物理服务器和虚拟服务器）则为系统提供特定服务。

Cluster: A group of networked computers sharing the same set of resources.
Cluster: 共享同一组资源的一组联网计算机。

Grid computing Seminar PPT from
Grid computing 研讨会 PPT from

What is ubiquitous Computing?
什么是 Ubiquitous Computing？

A popular theme in science fiction stories set in the future is ubiquitous computing. In this future, computers have become so small and pervasive that they are in practically everything. You might have computer sensors in your floor that can monitor your physical health. Computers in your car that can assist you when you drive to work. And computers practically everywhere track your every move.
在描绘未来的科幻故事中，一个常见的主题是普适计算。在这个未来图景中，计算机变得如此小巧且无处不在，几乎渗透到所有事物中。你的地板可能嵌入了计算机传感器，能够监测你的身体健康状况；汽车里的计算机系统可以在通勤时为你提供驾驶辅助；而遍布各处的计算机设备几乎实时追踪着你的一举一动。

It's a vision of the future that is both exhilarating and frightening. On the one hand, computer networks would become so robust that we'd always have a fast, reliable connection to the Internet. You could communicate with anyone you choose no matter where you were with no worries about interruption in service. But on the other hand, it would also become possible for corporations, governments or other organizations to gather information about you and keep tabs on you wherever you go.
这是一个既令人振奋又令人恐惧的未来图景。一方面，计算机网络将变得无比强大，我们始终能获得快速可靠的互联网连接。无论身处何地，你都可以与任何人自由沟通，无需担心服务中断。但另一方面，企业、政府或其他组织也将有能力收集你的信息，无论你走到哪里都能持续追踪你的动态。

We've seen steps toward ubiquitous computing over the last decade. Municipal Wi-Fi projects and 4G technologies like LTE and WiMAX have extended network computing far beyond the world of wired machines. You can purchase a smartphone and access petabytes of information on the World Wide Web in a matter of seconds. Sensors in traffic stoplights and biometric devices can detect our presence. It may not be long before nearly everything we come into contact with has a computer or sensor inside it.
过去十年间，我们见证了普适计算的发展进程。市政 Wi-Fi 项目与 LTE 和 WiMAX 等 4G 技术已将网络计算能力扩展到远超有线设备覆盖范围。人们购买智能手机后，数秒内即可访问万维网上的 PB 级信息。交通信号灯中的传感器和生物识别设备能感知人类存在。或许用不了多久，我们接触的几乎所有东西内部都会装有计算机或传感器。

Long Term Evolution, or LTE, is a 4G wireless broadband standard that replaces previous technologies like WiMax and 3G. LTE is faster than 4G but not as fast as 5G, the current wireless standard.
Long Term Evolution（LTE）是一种取代 WiMax 和 3G 等先前技术的 4G 无线宽带标准。LTE 比 4G 更快，但不如当前无线标准 5G 快。

C.3.2 Compare the major features of: • mobile computing • ubiquitous computing • peer-2-peer network • grid computing
C.3.2 比较以下各项的主要特征： • 移动计算（mobile computing） • 普适计算（ubiquitous computing） • 对等网络（peer-2-peer network） • 网格计算（grid computing）

Mobile computing 移动计算

Mobile Computing is a technology that allows transmission of data, voice and video via a computer or any other wireless enabled device without having to be connected to a fixed physical link. The main concept involves,
移动计算是一种技术，允许通过计算机或任何其他支持无线的设备传输数据、语音和视频，而无需连接至固定物理链路。其核心理念包括：

Mobile communication 移动通信
Mobile hardware: portable laptops, smartphones, tablet Pc’s, Personal Digital Assistants
移动硬件：便携式笔记本电脑、智能手机、平板电脑、个人数字助理
Mobile software 移动软件

Characteristics 特点

Portability: The Ability to move a device within a learning environment or to different environments with ease.
便携性：能够轻松将设备在学习环境内或不同环境之间移动。

Social Interactivity: The ability to share data and collaboration between users.
Social Interactivity：在用户之间共享数据并进行协作的能力。

Context Sensitivity: The ability to gather and respond to real or simulated data unique to a current location, environment, or time.
上下文敏感性（Context Sensitivity）：收集和响应特定于当前位置、环境或时间的真实或模拟数据的能力。

Connectivity: The ability to be digitally connected for the purpose of communication of data in any environment.
连接性：在任何环境下为进行数据通信而保持数字化连接的能力。

Individual: The ability to use the technology to provide scaffolding on difficult activities and lesson customization for individual learners.
Individual: 利用技术为个体学习者在困难活动中提供脚手架支持并进行课程定制的能力。

Advantages 优点

Increase in productivity- as they would be used out in the field of various companies, as it would reduce the time and cost for the client.
生产力提高——因为它们将被应用于不同公司的实际业务中，从而减少客户的时间和成本。
Entertainment- Mobile devices can be used for the entertainment purposes, for personal and even presentations to people and clients.
娱乐——移动设备可用于娱乐目的，既可用于个人用途，也可用于向他人和客户进行展示。
Cloud computing- Saving documents on online server and being able to access them anytime and anywhere when you have a connection to the internet.
云计算——将文档保存在在线服务器上，在连接到互联网时能够随时随地访问它们。
Portability- not restricted to one location in order for you to get jobs done or even access email on the go.
可移植性——不受限于单一地点，以便您完成任务，甚至在外出时访问电子邮件。

Disadvantages 缺点

Quality connectivity- mobile devices will need to either WIFI connectivity or mobile network. such as GPRS, 3G
高质量连接——移动设备需要具备 WIFI 连接或移动网络连接，例如 GPRS、3G。
Security concerns- Mobile VPNs are unsafe to connect to, and also syncing devices might also lead to security concerns. accessing a WiFi network can also be risky because WPA and WEP security can be bypassed easily.
安全问题——移动虚拟专用网络（VPN）的连接存在安全隐患，设备同步也可能引发安全问题。连接 WiFi 网络同样存在风险，因为 WPA 和 WEP 安全协议很容易被绕过。
Power consumption, due to the use of the batteries
因使用电池而产生的功耗

Ubiquitous computing (pervasive computing)
普适计算（普及计算）

Definition 定义

Ubiquitous computing is the idea of computing being available everywhere and anytime.
普适计算是指计算能力随时随地可用的理念。

Idea of invisible computing
隐形计算理念
Embedded computing (microprocessors)
嵌入式计算（微处理器）
- Need for low cost, low power computing with connectivity
  具备连接能力的低成本、低功耗计算需求
Usually includes a variety of sensors
通常包含多种传感器
Smart designs: different architectures
智能设计：不同架构
- Need for standards and protocols
  对标准与协议的需求

Peer-to-peer computing 对等计算

Definition 定义

PCs handling data locally instead of servers(becomes client and server); individual computers connect directly and communicating with each other as equals.
PC 在本地处理数据而非依赖服务器（成为客户端和服务器）；各计算机直接连接并以对等的方式相互通信。

“A peer-to-peer (P2P) network is created when two or more PCs are connected and share resources without going through a separate server computer”
点对点（P2P）网络是在两台或多台个人计算机连接并共享资源时建立的，无需通过独立的服务器计算机

Characteristics: 特点：

Decentralized 去中心化
- If one peer falls out not the whole network affected
  如果一个对等节点掉线，整个网络不会受到影响
- But data recovery of one peer that is shutdown is not possible
  但无法恢复已关闭的一个对等节点的数据
  - Requires independent backup
    需要独立备份
Each peer acts as client and server
每个对等节点同时充当客户端和服务器
Resources and contents shared across all peers and shared faster than client <-> server
在所有对等节点之间共享的资源和内容，其共享速度比客户端<->服务器模式更快
Has to be done by some software to enable this
必须通过某些软件来实现这一点
Malware can be faster distributed
恶意软件可以更快地传播

C.3.3 Distinguish between interoperability and open standards.
C.3.3 区分互操作性与开放标准。

Interoperability can be defined as “the ability of two or more systems or components to exchange information and to use the information that has been exchanged”. In order for systems to be able to communicate they need to agree on how to proceed and for this reason standards are necessary. A single company could work on different systems that are interoperable through private standards only known to the company itself. However, for real interoperability between different systems open standards become necessary.
互操作性可定义为“两个或多个系统或组件交换信息并使用所交换信息的能力”。为了让系统能够通信，它们需要就如何操作达成一致，因此标准是必要的。一家公司可以开发通过仅其自身知晓的私有标准实现互操作的不同系统。然而，要实现不同系统间真正的互操作性，开放标准变得不可或缺。

Open standards are standards that follow certain open principles. Definitions vary, but the most common principles are:
开放标准是指遵循某些开放准则的标准。具体定义各不相同，但最常见的准则包括：

public availability  公开可用性
collaborative development, usually through some organization such as the World Wide Web Consortium (W3C) or the IEEE
协作开发，通常通过某些组织，例如万维网联盟（W3C）或 IEEE
royalty-free  免版税
voluntary adoption  自愿采用
The need for open standards is described well by W3C director and WWW inventor Tim Berners-Lee who said that “the decision to make the Web an open system was necessary for it to be universal. You can’t propose that something be a universal space and at the same time keep control of it.”
万维网发明者、W3C 主任蒂姆·伯纳斯-李很好地阐述了开放标准的必要性，他指出："决定让万维网成为一个开放系统是它能够实现普遍性的必要条件。你不能既提议某事物成为通用空间，同时又保持对它的控制。"

Some examples of open standards include:
一些开放标准的例子包括：

file formats, e.g. HTML, PNG, SVG
文件格式，例如 HTML、PNG、SVG
protocols, e.g. IP, TCP  协议，例如 IP、TCP
programming languages, e.g. JavaScript(ECMAScript)
编程语言，例如 JavaScript(ECMAScript)

C.3.4 Describe the range of hardware used by distributed networks.
C.3.4 描述分布式网络所使用的各类硬件。

Students should be aware of developments in mobile technology that have facilitated the growth of distributed networks.
学生应了解促进分布式网络发展的移动技术进展。

This of course depends on the different types of distributed systems, but most generally speaking on a low level multiple CPUs need to be interconnected through some network, while at a higher level processes need to be able to communicate and coordinate. For each approach to distributed system, more specific types of hardware could be used:
这当然取决于不同类型的分布式系统，但广义而言，在低层次上需要通过网络互连多个 CPU，而在高层次上进程需要能够相互通信和协调。对于每种分布式系统实现方式，可能需要使用更具体的硬件类型：

Mobile computing: wearables (e.g. Fitbit ), smartphones, tablets, laptops, but also transmitters and other hardware involved in cellular networks
移动计算：可穿戴设备（例如 Fitbit ）、智能手机、平板电脑、笔记本电脑，以及涉及蜂窝网络的发射器和其他硬件
Ubiquitous computing: embedded devices, IoT devices, mobile computing devices, networking devices
普适计算：嵌入式设备、IoT 设备、移动计算设备、网络设备
Peer-to-peer computing: usually PCs, but can include dedicated servers for coordination
Peer-to-peer computing：通常是 PC，但可以包含用于协调的专用服务器
Grid computing: PCs and servers
网格计算：个人计算机与服务器
Content delivery networks (CDNs) is a system of distributed servers. They can cache content and speed up the delivery of content on a global scale
内容分发网络（CDNs）是一个分布式服务器系统。它们能够缓存内容并在全球范围内加速内容的分发。
Blockchain technology(e.g. Bitcoin, Ethereum) are decentralized and based on multiple peers, which can be PCs but also server farms
区块链技术(如 Bitcoin、Ethereum)具有去中心化特性，基于多个对等节点运行，这些节点既可以是个人电脑，也可以是服务器群。
Botnets can probably be considered a form of distributed computing as well, consisting of hacked devices, such as routers or PCs
僵尸网络或许也可被视为一种分布式计算形式，由被入侵的设备（如路由器或 PC）组成

C.3.5 Explain why distributed systems may act as a catalyst to a greater decentralization of the web
C.3.5 解释为什么分布式系统可能成为推动网络实现更大程度去中心化的催化剂

Distributed systems consist of many different nodes that interact with each other. For this reason they are decentralized by design, which you can see in this comparison.
分布式系统由许多相互交互的不同节点组成。因此，它们在设计上是去中心化的，通过这种比较可以看出这一点。

Therefore, the importance of distributed systems for a decentralized web lies in their benefits and disadvantages compared to classic centralized client-server models.
因此，分布式系统对去中心化网络的重要性，在于其相较于经典集中式客户端-服务器模式的优势与劣势。

Benefits 优势

higher fault tolerance 更高的容错性
stability 稳定性
scalability 可扩展性
privacy 隐私
data portability is more likely
数据可移植性更有可能
independence from large corporations such as Facebook, Google, Apple or Microsoft
独立于诸如 Facebook、Google、Apple 或 Microsoft 等大型企业
potential for high performance systems
高性能系统的潜力

Disadvantages 缺点

more difficult to maintain
更难以维护
harder to develop and implement
更难以开发和实施
increased need for security
安全需求的增长

Personal conclusion 个人结论

While some decentralized systems such as Bitcoins are gaining traction and some other systems like Git or Bittorrent have been around for a good time already, most part of the internet is still centralized, as most web applications follow the client-server model, which is further encouraged by corporations wanting to make profit. I found this post from Brewster Kahle’s Blog on the topic very interesting.
尽管比特币等一些去中心化系统逐渐流行，Git 或 BitTorrent 等其他系统也已存在多年，但互联网的大部分仍处于中心化状态，因为大多数网络应用遵循客户端-服务器模式，这种模式更被追求盈利的企业所推崇。我发现 Brewster Kahle 博客中关于这个话题的这篇文章非常有趣。

Compression & Decompression Week 2
压缩与解压缩第 2 周

C.3.6 Distinguish between lossless and lossy compression.
C.3.6 区分无损压缩和有损压缩。

Students will not be required to study the detailed compression algorithms
学生无需学习详细的压缩算法

C.3.7 Evaluate the use of decompression software in the transfer of information.
C.3.7 评估解压缩软件在信息传输中的使用。

Students can test different compression methods to evaluate their effectiveness.
学生可以测试不同的压缩方法以评估其有效性。

Compression 1 Objectives Understand Compression techniques and the need for compression
压缩 1 教学目标理解压缩技术及其必要性

Compression Definition : Reduce the size of data * the number of bits used to store data, most services charge on number of bits you transmit / will reduce bandwith use
压缩定义：减少数据的大小 * 存储数据所使用的比特数，大多数服务根据传输的比特数收费/将减少带宽使用

Benefits : Reduce storage needed and associated costs UX less latency speed use less bandwith:
优势：减少所需存储空间及关联成本 UX 降低延迟/提升速度减少带宽占用：

Possible downside ? 可能的缺点？

It takes time to decompress and if an urgent short message this could be viewed as a downside.
解压缩需要时间，如果收到紧急短信，这可能会被视为一个缺点。

What can we compress? 我们可以压缩什么？

Text ? How is text stored on computer ?
文本？文本是如何存储在计算机中的？

Other ? 其他？

How? 如何？

Take advantage of redundancy : Repeated patterns and exploit by using coding
利用冗余的优势：通过编码技术利用重复模式

Take advantage of human limitations : in hearing and sight so we can discard some information without impacting the experience
利用人类感官的局限性：在听觉和视觉方面，这样我们可以在不影响体验的情况下舍弃部分信息

Types of Compression ? 压缩类型？

Lossless ( Preserves all information ) can be used in ?
无损（保留所有信息）可用于哪些场景？

Lossy ( redundant information removed ) can it be used for text ?
有损（冗余信息被移除）它能用于文本吗？

Compression 2 Hoffman Example Objective to apply a compression method Intro to Binary Trees
压缩 2 霍夫曼示例目标：应用一种压缩方法二叉树简介

Text Compression Definition : Reduce the size of data * the number of bits used to store data
文本压缩定义：减少数据的大小 * 存储数据所用的位数

Compress Hello World to 33bits
将 Hello World 压缩至 33 位

"hello world"

h = 000

E= 001

L= 010

O = 011

- = 100

W= 101

R =110

D = 111

3 bits per letter total. 11 letters then 33 bits
每个字母总共 3 比特。11 个字母总共 33 比特

Better Than 33 Bits How?
如何超越 33 位？

How can we compress hello world to less than 33 bits?
我们如何能将 hello world 压缩到少于 33 位？

One Solution 一种解决方案

How can we compress text ?
我们如何压缩文本？

Encoded Message: 编码消息：	Key:

Common approach : Take advantage of words that occur frequently and store 1 dictionary item in the file header if word occurs simply use a pointer( 3 or 4 bits ) word may take 80 bits for example...
常见方法：利用频繁出现的单词，并在文件头中存储一个字典项。若单词出现，则仅使用指针（3 或 4 位）。例如，单词可能占用 80 位...

Hoffman

Compress "Hello World" 压缩 "Hello World"

Letter 信	Frequency 频率
H	1
E	1
L	3
O	2
_	1
W...	1
R	1
D	1

Why we need compression techniques
我们为何需要压缩技术

The storage capacity of computers is growing at an unbelievable rate—in the last 25 years, the amount of storage provided on a typical computer has grown about a millionfold—but we still find more to put into our computers. Computers can store whole books or even libraries, and now music and movies too, if only they have the room. Large files are also a problem on the Internet, because they take a long time to download. We also try to make computers smaller—even a cellphone or wristwatch can be expected to store lots of information!
计算机的存储容量正以难以置信的速度增长——在过去的 25 年里，典型计算机提供的存储量增长了约百万倍——但我们仍发现需要存储更多内容。只要存储空间足够，计算机可以存储整本书籍甚至图书馆，如今还能存储音乐和电影。大文件在互联网上也是个问题，因为它们需要很长时间才能下载完成。我们还试图让计算机变得更小——即便是手机或手表，预计也能存储大量信息！

Video Compression 视频压缩

Videos take up a lot of space. Uncompressed 1080 HD video footage takes up about 10.5 GB of space per minute of video, but can vary with frame rate. If you use a smartphone to shoot your video, 1080p footage at the standard 30 frames per second takes up 130 MB per minute of footage, while 4K video takes up 375 MB of space for each minute of film.
视频会占用大量存储空间。未压缩的 1080 高清视频素材每分钟约占用 10.5 GB 空间，但具体数值可能因帧速率而异。使用智能手机拍摄时，标准 30 帧/秒的 1080p 视频每分钟占用 130 MB 空间，而 4K 视频每分钟影片则需 375 MB 存储空间。

Because videos take up so much space, and because bandwidth is limited, video compression is used with video files to reduce the size of the file. Compression involves packing the file's information into a smaller space. This works through two different kinds of compression: lossy and lossless.
由于视频文件占用的存储空间极大，且带宽资源有限，因此需对视频文件进行压缩以减小其体积。压缩技术通过两种不同的方式实现：有损压缩（lossy）和无损压缩（lossless），其核心原理是将文件信息重新编码封装至更小的存储空间。

Lossy VIDEO and SOUND Compression Formats
有损视频与音频压缩格式

Lossy compression means that the compressed file has less data in it than the original file. Images and sounds that repeat throughout the video might be removed to effectively cut out parts of the video that are seen as unneeded. In some cases, this translates to lower-quality files because information has been lost, hence the designation "lossy."
有损压缩意味着压缩后的文件包含的数据比原始文件少。视频中重复出现的图像和声音可能会被移除，从而有效删除被视为不必要的视频部分。在某些情况下，这会导致文件质量降低，因为信息已经丢失，因此被称为"有损"。

However, you can lose a relatively large amount of data before you start to notice a difference (think MP3 audio files, which use lossy compression). Lossy compression makes up for the loss in quality by producing comparatively small files. For example, DVDs are compressed using the MPEG-2 format, which can make files 15 to 30 times smaller than the originals, but viewers still perceive DVDs as having high-quality pictures.
不过，在您开始注意到差异之前，可能会丢失相对大量的数据（想想使用有损压缩的 MP3 音频文件）。有损压缩通过生成相对较小的文件来弥补质量损失。例如，DVD 使用 MPEG-2 格式进行压缩，可以将文件缩小至原始大小的 15 至 30 倍，但观众仍认为 DVD 具有高质量的图像。

Most video files uploaded to the internet use lossy compression to keep the file size small while delivering a relatively high-quality product. If a video were to remain at its (in some cases) extremely high-quality file size, not only would it take forever to upload the content, but users with slow internet connections would have an awful time streaming the video or downloading it to their computers.
大多数上传至互联网的视频文件都采用有损压缩技术，在保持较小文件体积的同时提供相对高质量的成品。若视频保持其（某些情况下）极高的原始画质文件体积，不仅上传内容需要耗费极长时间，网速较慢的用户在流式传输视频或将其下载至电脑时也会遭遇极差的体验。

Lossless compression formats include Free Lossless Audio Codec (FLAC), Apple Lossless Audio Codec (ALAC), and Windows Media Audio Lossless (WMAL), among others.
无损压缩格式包括免费无损音频编解码器（FLAC）、苹果无损音频编解码器（ALAC）和 Windows 媒体音频无损格式（WMAL）等。

Text Compression Huffman 文本压缩 Huffman

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code.
霍夫曼编码是一种无损数据压缩算法。其基本思想是为输入字符分配可变长度代码，所分配代码的长度基于对应字符的出现频率。出现频率最高的字符获得最小的代码，而出现频率最低的字符则获得最大的代码。
The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bit stream.
分配给输入字符的变长编码是 Prefix Codes（前缀码），即这些编码（比特序列）的分配方式确保了一个字符的编码不会成为其他任何字符编码的前缀。这就是霍夫曼编码确保解码生成的比特流时不会产生歧义的方式。
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is prefix of codes assigned to a and b. If the compressed bit stream is 0001, the de-compressed output may be “cccd” or “ccb” or “acd” or “ab”.
让我们通过一个反例来理解前缀码。假设有四个字符 a、b、c 和 d，它们对应的变长编码分别为 00、01、0 和 1。这种编码方式会导致歧义，因为分配给 c 的编码是分配给 a 和 b 编码的前缀。如果压缩后的比特流是 0001，解压缩后的输出可能是"cccd"、"ccb"、"acd"或"ab"。

Question Section 问题部分

Describe one way that a compressed file may be decompressed
描述一种解压缩压缩文件的方法

Can lossy compression be used on text files ? Explain your answer
有损压缩可以用于文本文件吗？请解释你的答案

Explain how compression of data may lead to negative consequences. [3]
解释数据压缩如何可能导致负面影响。[3]

Also explain the importance of compression now and in the future.
同时说明现在和未来压缩技术的重要性。

C.4 The evolving web C.4 不断演变的网络

C.4.1 Discuss how the web has supported new methods of online interaction such as social networking.
C.4.1 讨论网络如何支持诸如社交网络等新的在线互动方法。

keywords & Phrases Web 1 and Web 2.0
关键词与短语 Web 1 和 Web 2.0

Web 1.0

Web 2.0

Semantic Web

ubiquitous 无处不在的

Berners-Lee

open protocols HTML HTTP 开放协议 HTML HTTP

decentralization 去中心化

ubiquitous 无处不在的

Read Only 只读

Write Only

Hyperlinks 超链接

Web of linked documents 相互链接的文档网络

decentralization 去中心化

successful companies that emerge at each stage of its evolution become monopolies market economics don’t apply.
在其发展的每个阶段出现的成功企业会成为垄断企业，市场经济规律不再适用。

keywords & Phrases Semantic Web
关键词 & 短语 Semantic Web

The aim of the Semantic Web is to shift the emphasis of associative linking from documents to data
语义网（Semantic Web）的目标是将关联链接（associative linking）的重点从文档（documents）转移到数据（data）

Abundantly available information can be placed in new contexts and reused in unanticipated ways. This is the dynamic that enabled the WWW to spread, as the value of Web documents was seen to be greater in information rich contexts (O’Hara & Hall, 2009).
海量可获取的信息能够被置于新的语境中，并以意想不到的方式被重新利用。正是这种动态特性推动了万维网（WWW）的普及，因为人们发现网络文档在信息丰富的语境中具有更大价值（O’Hara & Hall, 2009）。

WEB of Data 数据网络

Relational databases 关系型数据库

Excel Spead sheets Excel 电子表格

WEB 3.0

ubiquitous 无处不在的

open protocols HTML HTTP 开放协议 HTML HTTP

decentralization decentralization 去中心化

ubiquitous 无处不在

Governments are making data available seehttps://data.gov.uk/
各国政府正在公开数据，请参见 https://data.gov.uk/

datasets datasets 数据集

Democracy rules: open and free
民主规则：开放与自由

URL / URI 统一资源定位符 / 统一资源标识符

decentralization 去中心化

successful companies that emerge at each stage of its evolution become monopolies market economics don’t apply.
在其发展的每个阶段脱颖而出的成功公司都会成为垄断企业，市场经济学不再适用。

Read Only 只读

Write Only 只写

Hyperlinks 超链接

Web of linked documents 相互链接的文档网络

Students should be aware of issues linked to the growth of new internet technologies such as Web 2.0 and how they have shaped interactions between different stakeholders of the web.
学生应意识到与 Web 2.0 等新型互联网技术发展相关的问题，以及这些技术如何塑造了网络各利益相关者之间的互动。

Google Maps is it free ?
Google Maps 是免费的吗？

If your business is missing can you add it
如果您的业务缺失能否添加

Can this information be monetized ? How
这些信息能否货币化？如何操作？

Should Google have a monopoly on location information?
Google 应该垄断位置信息吗？

Are there Alternatives to Google Maps?
谷歌地图有替代品吗？

Watch Video below and then create an account on open street map and add a building your condo/house Wells school etc. Read this post and describe in your own word how Open Street Maps is Different from Google Maps. Post your response to google classroom
观看下面的视频，然后在 Open Street Map 上创建一个账户并添加一栋建筑物（你的公寓/房屋、威尔斯学校等）。阅读这篇文章，用自己的话描述 Open Street Maps 与 Google Maps 有何不同。将你的回答发布到 Google Classroom。

The beginnings of the web (Web 1.0 , Web of content)
万维网的起源（Web 1.0，内容网络）

Web 2.0 – “Web of the Users”
Web 2.0 —— "用户之网"

Web 3.0 – “Semantic Web”
Web 3.0——"语义网"

But what does this mean?
但这意味着什么？

Later developments 后续发展

C.4.2 Describe how cloud computing is different from a client-server architecture
C.4.2 描述云计算与客户端-服务器架构有何不同

Students should be aware of issues linked to the growth of new internet technologies such as Web 2.0 and how they have shaped interactions between different stakeholders of the web.
学生应了解与 Web 2.0 等新型互联网技术发展相关的问题，以及这些技术如何塑造了网络不同利益相关者之间的互动。

It’s worth noting that this comparison is not about two opposites. Both concepts do not exclude each other and can complement one another.
值得注意的是，这种比较并非针对两个对立面。这两个概念并不互相排斥，可以相互补充。

Client-server architecture
客户端-服务器架构

An application gets split into the client side and server-side. The server can be a central communicator between clients (e.g. email/chat server) or allow different clients to access and manipulate data in a database. A client-server application does also not necessarily need to be working over the internet, but could be limited to a local network, e.g. for enterprise applications.
应用程序被划分为客户端和服务器端。服务器可以充当客户端之间的中央通信器（例如电子邮件/聊天服务器），或允许不同客户端访问和操作数据库中的数据。客户端-服务器应用程序也不一定需要通过互联网运行，可以局限于本地网络，例如企业级应用场景。

Cloud computing 云计算

Lesson Plan Click Here 教案点击此处

Cloud computing still relies on the client-server architecture, but puts the focus on sharing computing resources over the internet. Cloud applications are often offered as a service to individuals and companies - this way companies don’t have to build and maintain their own computing infrastructure in house. Benefits of cloud computing include:
云计算仍然依赖于客户端-服务器架构，但重点在于通过互联网共享计算资源。云应用通常作为服务提供给个人和公司——这样公司就不必在内部自行构建和维护计算基础设施。云计算的优势包括：

Pay per use: elasticity allows the user to only pay for the resources that they actually use.
按使用付费：弹性允许用户仅为其实际使用的资源付费。

Elasticity: cloud applications can scale up or down depending on current demands. This allows a better use of resources and reduces the need for companies to make large investments in a local infrastructure.Wiki Quote "the degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible
弹性：云应用程序可以根据当前需求扩展或缩减。这使得资源得到更好利用，并减少了企业在本地基础设施上进行大量投资的需求。维基语录"系统通过自主方式配置和取消配置资源以适应工作负载变化的程度，使得在任何时间点可用资源都尽可能紧密匹配当前需求"

Self-provisioning: allows the user to set up applications in the cloud without the intervention of the cloud provider
自服务配置：允许用户在云中设置应用程序，而无需云服务提供商的干预

Company has options to use any of these SaaS, IaaS or PaaS
公司可选择使用这些 SaaS、IaaS 或 PaaS 中的任意一种

Using these services offers many advantages over the server client model : Can you think of some?
相较于服务器客户端模型，使用这些服务具有诸多优势：你能想到一些吗？

Azure is Microsoft cloud services the other major one is amazon click here and watch the intro video
Azure 是微软的云服务，另一个主要的是 amazon，点击此处观看介绍视频

What is the difference between scalability and elasticity?
可扩展性和弹性之间有什么区别？

C.4.3 Discuss the effects of the use of cloud computing for specified organizations
C.4.3 讨论云计算对特定组织使用的影响

To include public and private clouds
包括公有云和私有云

*** Creates an environment conducive to innovative startups and thus the potential for disruptive innovation
*** 创造一个有利于创新初创企业的环境，从而带来颠覆性创新的潜力

Private cloud 私有云

In a private cloud model a company owns the data centers that deliver the services to internal users only.
在私有云模型中，公司拥有仅向内部用户提供服务的数据中心。

Scalability 可扩展性

Self-provisioning 自服务配置

Direct control 直接控制

Changing computer resources on demand
按需调整计算机资源

Limited access through firewalls improves security
通过防火墙限制访问权限可提升安全性

Can you think of any disadvantages?
你能想到哪些缺点？

Same high costs for maintenance, staffing, management
维护、人员配备和管理方面同样高昂的成本

Additional costs for cloud software
云软件的额外费用

Public cloud 公有云

In a public cloud services are provided by a third party and are usually available to the general public over the Internet.
在公有云中，服务由第三方提供，通常通过互联网向公众开放。

Advantages 优点

Easy and inexpensive because the provider covers hardware, application and bandwidth costs
简单且成本低廉，因为提供商承担了硬件、应用和带宽成本

Scalability to meet needs
可扩展性以满足需求

No wasted resources 无资源浪费

Costs calculated by resource consumption only
仅基于资源消耗计算成本

Disadvantages 缺点

No control over sensitive data
无法控制敏感数据

Security risks 安全风险

Hybrid cloud 混合云

The idea of a hybrid cloud is to use the best of both private and public clouds by combining both. Sensitive and critical applications run in a private cloud, while the public cloud is used for applications that require high scalability on demand. As TechTarget explains, the goal of a hybrid cloud is to “create a unified, automated, scalable environment that takes advantage of all that a public cloud infrastructure can provide while still
混合云的理念是通过结合私有云和公有云来发挥两者的优势。敏感和关键应用程序运行在私有云中，而公有云则用于需要按需高度可扩展性的应用。正如 TechTarget 所解释的，混合云的目标是"创建一个统一、自动化、可扩展的环境，既能充分利用公有云基础设施提供的所有优势，同时仍然保持..."

Summary of obstacles/Concerns
障碍/问题总结

service availability 服务可用性

data lock-in also if wish to change what format will your data be in ? Could be very expensive to convert to a new data format
数据锁定问题：若想变更数据格式，您的数据将以何种形式存在？转换为新数据格式可能产生高昂成本

Company goes bust with all your data
公司倒闭，连带您所有数据

data confidentiality and auditability ( security)
数据机密性和可审计性（security）

data transfer bottlenecks
数据传输瓶颈

performance unpredictability
性能不可预测性

Data Conversions 数据转换

bugs in large-scale distributed systems
大规模分布式系统中的错误

C.4.5 Describe the interrelationship between privacy, identification and authentication
C.4.5 描述隐私、身份识别与认证之间的相互关系

Privacy 隐私

Defined as the seclusion of information from others. In the context of the web this can relate to healthcare record, sensitive data from financial institutions, residential/geographic records, criminal justice investigations/proceedings. For such information it is essential to prevent unauthorized access.
定义为将信息与他人隔离。在网络环境中，这可能涉及医疗记录、金融机构的敏感数据、居住/地理记录、刑事司法调查/诉讼程序。对于此类信息，必须防止未经授权的访问。

Identification 标识

Defined as the process of claiming to be one’s identity. This process is important for privacy and is required for authentication.
定义为声明个人身份的过程。这一过程对隐私保护至关重要，也是身份验证的必要步骤。

Authentication 身份验证

Process of proving/confirming one’s identification. Most usually done through a username-password combination, but other methods such as two-factor authentication become more and more prominent on the web. A common form of two-factor authentication requires the user to enter a code received by SMS in addition to the conventional password.
验证/确认个人身份的过程。最常见的方式是通过用户名-密码组合进行验证，但双因素认证等其他方法在网络中日益普及。常见的双因素认证形式要求用户在输入传统密码的基础上，额外输入通过短信接收的验证码。

C.4.7 Explain why the web may be creating unregulated monopolies
C.4.7 解释网络可能正在催生不受监管的垄断的原因

In theory the world wide web should be a free place where anybody can have a website. However, hosting a website usually comes with a cost - registering a domain name, getting a hosting service or investing in servers oneself, creating and maintaining the website (requires technical knowledge or the cost of hiring a web developer). In addition, to reach an audience further marketing through SEO (see C.2) is usually necessary to get good rankings in search engine results. This means that for the normal individual a traditional website is not the best option. A better alternative is to publish content on an existing platform, e.g. micro blogging on Twitter, blogging on WordPress or Blogspot, sharing social updates on Facebook, sharing photos on Flickr, etc. . This comes with improved comfort for users.
理论上，万维网应该是一个自由的空间，任何人都可以拥有网站。然而，托管网站通常需要成本——注册域名、获取托管服务或自行投资服务器、创建和维护网站（需要技术知识或雇佣网页开发者的成本）。此外，为了触达受众，通常需要通过 SEO（见 C.2）进行进一步营销，才能在搜索引擎结果中获得良好排名。这意味着对于普通个人而言，传统网站并非最佳选择。更好的替代方案是在现有平台上发布内容，例如在 Twitter 进行微博客、在 WordPress 或 Blogspot 写博客、在 Facebook 分享社交动态、在 Flickr 分享照片等。这为用户带来了更高的便利性。

However, it easily leads to unregulated monopolies in the market because users usually stick to one platform. Tim Berners-Lee describes today’s social networks as centralized silos, which hold all user information in one place. This can be a problem, as such monopolies usually control a large quantity of personal information which could be misused commercially or stolen by hackers. There are certainly many more concerns which won’t fit into the scope of this site.
然而，这很容易导致市场上出现不受监管的垄断，因为用户通常只使用一个平台。Tim Berners-Lee 将当今的社交网络描述为集中化的信息孤岛，将所有用户信息储存在单一位置。这会带来隐患，因为此类垄断平台通常控制着大量个人信息，这些信息可能被商业滥用或被黑客窃取。当然还有更多超出本网站讨论范围的潜在问题。

C.4.8 Decentralized and democratic web
C.4.8 去中心化与民主化网络

[pullquote align="normal" cite="Eric Newton Innovation Chief/Cronkite News, Arizone State University "]A Decentralized Web is free of corporate or government overlords. It is to communication what local farming is to food. With it people can grow their own information[/pullquote]
[pullquote align="normal" cite="Eric Newton Innovation Chief/Cronkite News, Arizone State University "]去中心化网络摆脱了企业和政府的控制。它对通信的意义，就如同本地农业对食物的意义一样。有了它，人们可以培育自己的信息[/pullquote]

Search Bubbles 过滤气泡

A Filter Bubble Demonstration - Try this at home!
过滤气泡（Filter Bubble）演示——在家试试看！

One way to see how filter bubbles work with search engines that do personalization (like Google) is to take a word that can have multiple meanings in different contexts and build up different search histories using those contexts. Then, when you search for the same word after having built up different search histories, the search engine should return results that look a bit different.
要了解过滤气泡如何在支持个性化定制的搜索引擎（如 Google）中发挥作用，一种方法是选取一个在不同语境下具有多重含义的词汇，并基于这些语境构建不同的搜索历史记录。随后，当您在使用不同搜索历史记录后搜索同一词汇时，搜索引擎应该会返回略有差异的搜索结果。

For this demonstration to work, you need to be sure to clear your search history before you start each round. This works even better if you have 2 or 3 people working side by side at different computers. That way you can compare the results more easily.
为确保此演示顺利进行，你需要在每轮开始前清除搜索历史记录。若有 2 至 3 人同时操作不同电脑，效果更佳，这样能更方便地比较结果。

Try this with the word Tea.
用“茶”这个词试试看。

1. Have someone build a search history using names of countries where tea is popularor names of countries where teas orgininated. Remember, do not use the word "tea" as a search term quite yet. Examples would be England, Japan, China, Latin America, etc.
1. 让人建立一个搜索历史，使用茶受欢迎的国家名称或茶起源的国家名称。记住，暂时不要使用"tea"这个词作为搜索词。例如 England、Japan、China、Latin America 等。

2. Have another person build a search history using different spices, herbs, and flowersthat make up common teas. Examples would be roses, cinammon, chrysanthemum, lavender, etc.
2. 让另一个人使用构成常见茶的不同香料、草药和花卉来构建搜索历史。例如玫瑰、肉桂、菊花、薰衣草等。

3. Have a third person search for anything related to politics, such as names of political parties (not the Tea Party just yet, though!), names of political movements, words like "activism," or "conservative" and "liberal."
3. 让第三方人员搜索与政治相关的任何内容，例如政党名称（不过暂时不包括 Tea Party！）、政治运动的名称，以及诸如“activism”、“conservative”和“liberal”等词汇。

4. When you are performing these searches, click on some of the results (preferably general ones that might somehow later be connected to tea!). This will contribute to your search history.
4. 当你执行这些搜索时，点击部分结果（最好是稍后可能与茶相关的通用结果！）。这将有助于丰富你的搜索历史记录。

5. Finally, have everyone search for the word "Tea." Have fun comparing results!
5. 最后，让所有人搜索“Tea”这个词。愉快地比较结果吧！

Note: Your results may still look very similar; the differences may be subtle. Whether or not the filter bubble is really something to be concerned about will be discussed in the next tab.
注意：您的结果可能看起来仍然非常相似；差异可能很细微。信息茧房是否真的值得担忧，将在下一标签页中讨论。

Who does this? 谁来做这件事？

These are just a few of the websites that tailor results to you and your clicking history:
以下是根据您和您的点击历史定制结果的几个网站示例：

Google	Amazon	Washington Post 华盛顿邮报
Netflix	Yahoo News	New York Times 纽约时报
Facebook	Huffington Post 赫芬顿邮报

C.5 (HL) Analyzing the web
C.5 (HL) 分析万维网

C.5.1 Describe how the web can be represented as a directed graph.
C.5.1 描述网络如何可以表示为有向图。

Link to PowerPoint Slides
PowerPoint 幻灯片链接

Reference for this section please click here please note the definition of a tube is conflicting a more simple / general definition can be found in this paper click here
本节参考资料请点击此处请注意，“tube”的定义存在冲突，更简单/通用的定义可在此论文中查找点击此处

Questions on Directed Graphs
有关有向图的问题

(a) Name an edge you could add or delete from the graph in Figure 13.8 so as to increase the size of the largest strongly connected component.
(a) 请说出你可以从图 13.8 中增加或删除的一条边，以增大最大强连通组件的规模。

(b) Name an edge you could add or delete from the graph in Figure 13.8 so as to increase the size of the set IN
(b) 请指出在图 13.8 中你可以添加或删除的一条边，以使集合 IN 的规模增大

(c) Name an edge you could add or delete from the graph in Figure 13.8 so as to increase the size of the set OUT.
(c) 请指出在图 13.8 中你可以添加或删除的一条边，以增大集合 OUT 的大小。

C.5.2 Outline the difference between the web graph and sub-graphs.
C.5.2 概述网络图（web graph）与子图（sub-graphs）之间的区别。

Web graph

Web graph describes the directed links between web pages in the WWW.
Web 图描述了 WWW 中网页之间的有向链接。
It is a directed graph with directed edges
它是一个具有有向边的有向图
- Page A has a hyperlink to Page B, creating a directed edge from Page A to Page B
  页面 A 有一个指向页面 B 的超链接，从而创建了从页面 A 到页面 B 的有向边

Sub-Graph 子图

A set of pages that are part of the internet
作为互联网组成部分的一组页面
Can be a set of pages linked to a specific topic ex.: Wikipedia -> one topic but references(and hyperlinks) to other web pages
可以是一组链接到特定主题的页面，例如：Wikipedia -> 一个主题但包含对其他网页的引用（和超链接）
Can be a set of pages that deal with part of an organization
可以是一组处理组织部分事务的页面

C.5.3 Describe the main features of the web graph such as bowtie structure, strongly connected core (SCC), diameter.
C.5.3 描述网络图的主要特征，例如领结结构、强连通核心（SCC）、直径。

C.5.4 Explain the role of graph theory in determining the connectivity of the web.
C.5.4 解释图论在确定网络连通性中的作用。

Connectivity  连通性
This is just a metric to discuss how well parts of a network connect to each other.
这只是一个用于讨论网络各部分之间连接程度的指标。

Small world graph  小世界图
This is a mathematical graph whereas not all nodes are directly neighbors, but any given pair of nodes can be reached by a small number of hops or better said with just a few links. This is due to nodes being interconnected through interconnected hubs.
这是一个数学图，虽然并非所有节点都是直接邻居，但任意两个节点之间只需少量跳跃（或更准确地说，仅需几个链接）即可到达。这是由于节点通过相互连接的枢纽实现互连。

2 Properties of the small world graph:
2 小世界图的属性：
Mean shortest-path length will be small
平均最短路径长度会很小
Most pairs of nodes will be connected by at least one short path
大多数节点对之间至少存在一条短路径相连
many clusters (highly connected subgraphs)
大量集群（高度连通的子图）
Analogy: airlines flight whereas you can reach any city most likely in just under three flights.
类比：航空公司的航班，你很可能只需不到三次航班就能到达任何城市。
Examples: network of our brain neurons
我们大脑神经元的网络
Maximizes connectivity  最大化连接性
Minimizes # of connections
最小化连接数

6 degrees of separation 六度分隔
This originates from the idea that any human in the world is related in some way over 6 or less connections (steps). This idea can be taken further in a more general perspective on a graph, whereas any given pair of nodes within the network can be reached with just a maximum of 6 steps.
这一概念源于世界上任何两个人之间都能通过不超过 6 个连接（步骤）以某种方式建立联系的观点。从图的更普遍视角来看，这一观点可以进一步延伸为：网络中的任意两个节点最多只需 6 步即可相互到达。

The idea itself can be applied to the web graph, suggesting high connectivity regardless of big size.
这一理念本身可应用于 web graph，表明即使规模庞大，仍具备高连通性。

Not necessarily small world graph
不一定是 small world graph
High connectivity between all nodes
所有节点之间的高连接性
Web diameter (and its importance)
Web diameter (and its importance) 网络直径（及其重要性）
The diameter of the web graph has no standard definition, but is usually considered to be the average distance (as each edge has the same path length, this would be steps) between two random nodes.
网络图的直径没有标准定义，但通常被认为是两个随机节点之间的平均距离（由于每条边具有相同的路径长度，这里指步数）。

This is important because it is an indicator of how quickly one can reach some page from any starting page in average. This is of importance for crawler, which want to index as many pages as possible in the shortest path.
这一点很重要，因为它是衡量从任意起始页面平均能多快到达某个页面的指标。这对于爬虫程序（crawler）尤为重要，因为它们希望在最短路径内索引尽可能多的页面。

average distance between two random nodes (pages)
两个随机节点（pages）之间的平均距离
important for crawlers to have a guide of how many steps it should take to reach a page
对爬虫而言，拥有指引其到达页面所需步骤数的指南至关重要
a factor to consider, is if the path is directed or undirected
需要考虑的一个因素是路径是定向（directed）还是非定向（undirected）的
often there is no direct path between nodes
节点之间通常不存在直接路径
Importance of hubs and authorities (link to C.2.3)
枢纽与权威的重要性（链接至 C.2.3）
Hubs and authorities have special characteristics:
枢纽节点和权威节点具有特殊的特征：

Hubs: have a large number of outgoing links
Hubs：拥有大量出站链接
Authorities: have a large number of incoming links
权威页面：拥有大量入站链接
For connectivity, this means that a larger number of hubs improves connectivity, while authorities are more likely to decrease connectivity as they usually do not link to many other pages.
就连接性而言，这意味着枢纽数量越多，连接性越好，而权威节点通常不会链接到许多其他页面，因此更可能降低连接性。

C.5.5 Explain that search engines and web crawling use the web graph to access information.
C.5.5 解释搜索引擎和网络爬虫如何利用网络图来访问信息。

C.5.6 Discuss whether power laws are appropriate to predict the development of the web.
C.5.6 讨论幂律是否适用于预测网络的发展。

C.6 (HL)The intelligent web
C.6 (HL) 智能网络

C.6.1 Define the term semantic web
C.6.1 定义术语语义网

The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.
语义网是当前万维网的扩展，其中信息被赋予明确定义的含义，从而更好地促进计算机和人类协同工作。

a proposed development of the World Wide Web in which data in web pages is structured and tagged in such a way that it can be read directly by computers
万维网的一项提议发展，其中网页中的数据被结构化和标记，使计算机可以直接读取

The Semantic web is an extension of the current web where data in webpages is structured and tagged to give it semantic meaning and make it machine-understandable, allowing computers and people to work in cooperation
语义网是现有网络的一个扩展，其中网页中的数据经过结构化和标记，赋予其语义意义并使其能够被机器理解，从而使计算机和人类能够协作

C.6.2 Distinguish between the text-web and the multimedia-web
C.6.2 区分文本网络（text-web）与多媒体网络（multimedia-web）

The traditional web is seen as being text based, the semantic web is multimedia based. Text-web - “read-only” Web
传统网络被视为以文本为基础，而语义网则以多媒体为基础。文本网络——"只读"网络
Unlike text-web, multimedia web pages use different forms of graphic content
与 text-web 不同，多媒体网页使用不同形式的图形内容

C.6.3 Aims of semantic web
C.6.3 语义网的目标

Ultimate aim is to allow computers to do more useful work and allow people and computers to work together
最终目标是使计算机能够完成更多有用的工作，并使人和计算机能够协同工作

C.6.4 Distinguish between an ontology and folksonomy
C.6.4 区分本体与大众分类法

[tip: also read C.6.5 in conjunction to better understand C.6.4]
[提示：同时阅读 C.6.5 以结合理解 C.6.4]

References: http://www.sims.monash.edu.au/subjects/ims2603/resources/week7/7.1.pdf
参考文献：http://www.sims.monash.edu.au/subjects/ims2603/resources/week7/7.1.pdf

important to understand very simply classification, folksonomy, and its drawbacks
有必要简单了解分类、大众分类法及其缺点

http://www.ijodls.in/uploads/3/6/0/3/3603729/3_mohmedhanif__29-35_.pdf

tagging and folksonomy 标签和分众分类法
http://www.cl.cam.ac.uk/~aac10/R207/ontology_vs_folksonomy.pdf

Ontology- A collection of names for a particular concept, like our music instruments assignment and relation types, organised in a type, sub-type and sub-sub type as so on
本体论（Ontology）- 针对某一特定概念（如我们的音乐器材作业）的名称集合及关系类型，按类型、子类型、子子类型等层次结构进行组织

Folksonomy - Folksonomy is the result of personal free tagging of information and objects (anything with a URL) for one's own retrieval. The tagging is done in a social environment (usually shared and open to others). Folksonomy is created from the act of tagging by the person consuming the information
Folksonomy - Folksonomy 是个人在社交环境（通常与他人共享并开放）中为便于自身检索而对信息和对象（任何带有 URL 的事物）进行自由标注的结果。该分类体系产生于信息消费者实施标注行为的过程中。

Folksonomy is a system of classification derived from the practice and method of collaboratively creating and translating tags to annotate and categorize content. Ontology is hard to implement on a large scale and isn’t always web based. It is Key for the semantic web because of high expressive power. Folksonomy is created by users as well as being quick and easy to implement. It is used on a large scale for document collections. Most of the time its web based and important in web 2.0
分众分类法（Folksonomy）是一种通过协作创建和转换标签来注释与分类内容的分类系统。本体论（Ontology）难以大规模实施且不总是基于网络。因其高表达能力，它是语义网的关键。分众分类法由用户创建，实施快速简便，被大规模应用于文档集合管理，通常基于网络且在 Web 2.0 中具有重要地位。

C.6.5 Describe how folksonomies and emergent social structures are changing the web
C.6.5 描述分众分类（folksonomies）和涌现的社会结构如何改变网络

Folksonomies and social structures are changing the web because they are a system of classification. Folksonomy is created by users as well as being quick and easy to implement. If we look at facebook for example, most of the content on there is created by the user. They are able to upload images, create statuses and much more. They are changing the web because all the content is determined by the user as opposed to the owners of the companies.
分众分类法（Folksonomies）与社会结构正在改变网络，因为它们本身即构成一种分类系统。分众分类法由用户创建且易于快速实施。以 Facebook 为例，其平台上的大部分内容均由用户生成。用户能够上传图片、发布状态更新等多样化操作。这些行为正在重塑网络生态，因为所有内容均由用户而非企业所有者所决定。
An example of folksonomies is the tag system in image sites or hashtags in social media. Users are defining more and more tags and as the volume of users tagging increases, the accuracy of tags increases such that the web is becoming more and more precise. Tagging increases user participation in the web while enhancing searching and semantics of the web.
大众分类法（folksonomies）的一个例子是图片网站中的标签系统或社交媒体中的标签（hashtag）。用户正在定义越来越多的标签，随着用户标注量的增加，标签的准确性也在提升，使得网络变得越来越精确。标签化在增强网络搜索与语义功能的同时，也提高了用户对网络的参与度。

Folksonomy in social structures refers to users tagging media on social sites, for example tagging a picture of a cat as cute and animal. On sites like Flickr when more and more people tag a photo, it helps change the web by introducing semantic meaning to media, meaning created by users themselves. This is only possible through these emerging structures like Facebook, Flickr, etc. that allow us to create folksonomies through tagging
社交结构中的分众分类（Folksonomy）指用户在社交网站上对媒体进行标签分类的行为，例如将一张猫的照片标记为"cute"和"animal"。在 Flickr 等平台上，当越来越多用户为照片添加标签时，通过用户自身创建的语义（semantic meaning），这种集体标注行为帮助改变了网络媒体形态。这种变革只有通过 Facebook、Flickr 等新兴社交平台的结构才得以实现，这些平台允许我们通过标签创建分众分类体系。

Folksonomy is a key part of Web 2.0 since it allows users to interact with the system and organize the wealth of information online. Most of the Web 2.0 technologies now have the flexibility to allow user to describe the content using keywords, categories, or labels. Note that these keywords, labels, tags, etc. are actually being used in many services on the web now.
分众分类（Folksonomy）是 Web 2.0 的关键组成部分，因为它允许用户与系统互动并组织海量在线信息。当前大多数 Web 2.0 技术都能灵活地让用户使用关键词、分类或标签来描述内容。值得注意的是，这些关键词、标签等元素如今已被广泛应用于众多网络服务中。
This helps in identifying the content from the user context and helps for future retrieval. Folksonomy is important field of web 2.0 services. User index resources by themselves with free keywords, which are called tags. There are a lot of services online, especially for index bookmarks. Del.icio.us is here the most famous one
这有助于从用户上下文中识别内容，并为未来的检索提供便利。大众分类（Folksonomy）是 Web 2.0 服务的重要领域。用户使用自由关键词自行索引资源，这些关键词被称为标签。在线有许多此类服务，尤其是用于索引书签的服务。Del.icio.us 是其中最著名的一个。

C.6.6 Explain why there needs to be a balance between expressivity and usability on the semantic web.
C.6.6 解释为什么语义网需要在表达力与可用性之间保持平衡。

C.6.7 Evaluate methods of searching for information on the web.
C.6.7 评估在网络上搜索信息的方法。

C.6.8 Distinguish between ambient intelligence and collective intelligence
C.6.8 区分环境智能与集体智能

Slide to Ambient / Collective Intelligence Click Here
滑动至 Ambient/Collective Intelligence 点击此处

C.6.9 Discuss how ambient intelligence can be used to support people.
C.6.9 讨论环境智能如何用于支持人们。

C.6.10 Explain how collective intelligence can be applied to complex issues.
C.6.10 解释如何将集体智慧应用于复杂问题。

Past Paper Questions 历年试题题目

C.1.1 Distinguish between the internet and World Wide Web (web). 2
C.1.1 区分互联网（internet）与万维网（World Wide Web，简称 Web）。2

A student in the United Kingdom is viewing a page from a newspaper’s website based in South Africa.(a) Using this example, distinguish between the internet and the World Wide Web.
一名英国学生正在查看南非一家报纸网站的一个页面。(a) 使用这个例子，区分互联网和万维网。

C.1.1 Outline one difference between the internet and the World Wide Web (WWW). [2] •
C.1.1 概述互联网（internet）和万维网（World Wide Web，WWW）之间的一个区别。[2] •

Internet: • Connects computers / network of networks; • Focuses on physical layer; • A way to transport content/exchange information through languages • and protocols;
互联网：• 连接计算机/网络的网络；• 专注于物理层；• 通过语言和协议传输内容/交换信息的方式；

WWW: • The resources that allow one to connect/aggregate people around an • activity/interest (expressions focusing on a social dimension are fine); • A way to access/share/exchange information that is built on top of the internet; • A way for applications to communicate/share/exchange;
WWW：• 允许围绕某种活动/兴趣将人们连接/聚集起来的资源（聚焦社交维度的表达方式适用）；• 一种建立在互联网之上的信息访问/共享/交换方式；• 应用程序之间进行通信/共享/交换的途径；

C.1.2 Describe how the web is constantly evolving
C.1.2 描述网络如何不断演变

Many newspapers now host an internet version through which users can read the various news stories.
许多报纸现在都开设了网络版，用户可以通过该平台阅读各类新闻报道。

(b) Identify two other electronic ways in which newspapers provide information through the use of the technology brought about by the evolving web.
(b) 找出报纸通过使用不断发展的网络技术提供的另外两种电子化信息传播方式。

C.1.3 Identify the characteristics of the following:
C.1.3 识别以下各项的特征：

Identify one characteristic of Hypertext Markup Language (HTML).
指出超文本标记语言（HTML）的一个特点。

Tags and Elements 标签与元素
HTML tags tell the browser how to display the content
HTML tags 告诉浏览器如何显示内容
HTML tags label pieces of content such as "heading", "paragraph", "table", and so on
HTML 标签用于标记诸如"heading"、"paragraph"、"table"等内容片段
Browsers do not display the HTML tags, but use them to render the content of the page
浏览器不会显示 HTML 标签，而是利用它们来渲染页面内容

Example Tag 示例标签

Example Element 示例元素

<p></p>

<p>This is where you would write the text of the paragraph that you would like displayed on the web page.</p>

Identify one characteristic of XML
指出 XML 的一个特性

Allows designers to create their own customised tags
允许设计师创建自己的自定义标签
XML was designed to store and transport data.
XML 旨在存储和传输数据。
XML was designed to be both human- and machine-readable
XML 被设计成既便于人类阅读，也便于机器读取
XML documents do not carry information about how to display the data.
XML 文档不包含如何显示数据的信息。

XML is machine and human readable. A xml site map helps bots follow all the pages on a web site so that they get indexed . Click here for an example of a XML site Map. Note is has no styling.
XML 是机器和人类均可读取的格式。XML 网站地图能帮助网络爬虫追踪网站上的所有页面以便索引。点击此处查看 XML 网站地图示例。请注意该示例未添加样式设计。

C.1.3 Identify the characteristics of the following:
C.1.3 识别以下内容的特征：

Consider the section of XML code shown below:
请考虑如下所示的 XML 代码片段：

<bird>

<name>Eagle Owl</name> 雕鸮

<description>An owl the size of an eagle</description>
体型与鹰相仿的猫头鹰

<habitat>Forests</habitat>
森林

</bird> 《选项 C Web Science - COMPUTER SCIENCE 中学与高中》

(b) Identify one similarity between HTML and XML. [1]
(b) 指出 HTML 和 XML 之间的一个相似之处。[1]

One of the characteristics of XML code is its ability to describe data.
XML 代码的特点之一是其描述数据的能力。

(c) By making direct reference to the XML code above, outline one way in which this characteristic is shown [2]
(c) 通过直接引用上述 XML 代码，概述一种体现该特性的方式 [2]

XML can use easily identifiable field names/tags for data;
XML 可以使用易于识别的字段名称/标签来存储数据；
Example <species> 示例
List Element 列表元素

C.1.3 Identify the characteristics of the following:
C.1.3 识别下列各项的特征：

<dvd>

<title>The Hobbit</title>

<genre>Fantasy</genre> 奇幻

<dvd>

<title>Sleepless in Seattle</title>

<genre>Romance</genre> 爱情

</dvd>

</collection>

• </collection>
•

(a) Identify the error in the section of XML code shown above. ?
(a) 识别上述 XML 代码片段中的错误。?

In some applications, XML is used instead of HTML principally because of its extensibility property.
在某些应用中，XML 被用来替代 HTML，主要是因为其可扩展性特性。

(b) Outline the meaning of this property in the context of XML code
(b) 概述该属性在 XML 代码上下文中的含义

ACAME has developed its website to be accessible on many different devices. CSS is used in the development of the website.
ACAME 已开发其网站，使其能够在多种不同设备上访问。在网站开发中使用了 CSS。

(d) With reference to this use of CSS
(d) 结合 CSS 的这种应用

(i) describe one advantage for the user; [2]
(i) 描述对用户而言的一个优势；[2]

Any future design changes will be easy and thus less costly (
任何未来的设计变更都将更加容易，从而降低成本（
Download time should be faster. Because you’ve placed all your CSS in one separate file the code will be cached in the browser after the initial request. It doesn’t need to be downloaded again for subsequent pages
下载时间应该更快。由于您将所有 CSS 代码都放在一个独立的文件中，这些代码在首次请求后会被浏览器缓存，后续页面无需再次下载。
?
?

(ii) describe one advantage for the web developer. [2]
(ii) 描述对网络开发者的一个优势。[2]

Faster development by assigning a designer to creating CSS and programmer developing the functionality
通过分配设计师负责创建 CSS、程序员负责开发功能来加快开发速度
CSS has more formatting options over HTML
相较于 HTML，CSS 拥有更多的格式设置选项
?

C.1.4 Identify the characteristics of the following:
C.1.4 识别以下各项的特征：

http://www.southafricantimes.com/football/mon/rt

ACAME is a company that offers a subscription service that permits the
ACAME 是一家提供允许用户访问的订阅服务的公司
downloading of animated cartoons.
动画片下载。

When browsing their website the following uniform resource locator (URL) is visited:
当浏览他们的网站时，会访问以下统一资源定位符（URL）：

http://minerva.hq.acame.net/Products/index.html

(a) Identify in the given URL
(a) 识别给定 URL 中的

(i) the domain name of the server; [1]
(i) 服务器的域名；[1]

(ii) the file path of the resource in the server. [1]
(ii) 服务器中资源的文件路径。[1]

The URL of this website is www.OpenSourceDev.org. Any new pieces of code that the developers make available become new resources on the website. A script generates weekly automatic notifications of new code available on the site, and sends this notification to users as an email.
该网站的 URL 是 www.OpenSourceDev.org。开发者发布的任何新代码都会成为网站上的新资源。一个脚本会每周自动生成网站上新代码的通知，并通过电子邮件将此通知发送给用户。

(c) (i) Outline, with an example, how the URL for these new pieces of code will be generated. [2]
(c) (i) 概述并举例说明这些新代码片段的 URL 将如何生成。[2]

C.1.5 Describe the purpose of a URL
C.1.5 描述 URL 的用途

Describe the purpose of a URL. 2
描述 URL 的作用。2

{Identify a resource online, specifies protocol which helps determine how the resource will be retrieved/accessed; for web pages, identifies resource name (domain) and the protocol (http) and pathway if not at domain level}
识别在线资源，指定协议，帮助确定如何检索/访问该资源；对于网页，识别资源名称（domain）、协议（http）以及若非在域名层级的路径（pathway）

C.1.6 Describe how a domain name server functions.
C.1.6 描述域名系统（DNS）服务器如何运作。

When a user enters a URL into the search bar of the browser the URL will normally be sent to a domain name server.
当用户在浏览器的搜索栏中输入 URL 时，该 URL 通常会被发送到域名服务器。

(e) Identify the two possible actions that this domain name server will then take.
(e) 确定该域名服务器接下来可能采取的两种行动。

The new IP uses a different format for representing the address field in the packet ?
新型 IP 协议采用不同格式来表示数据包中的地址字段？

C.1.7 Identify the characteristics of:
C.1.7 识别以下内容的特征：

the TCP and IP protocols.
TCP 和 IP 协议。

(d) Describe how these two protocols work together when sending data over
(d) 描述在发送数据时这两种协议如何协同工作
the internet. 互联网。

TCP divides a message or file into packets that are transmitted over the internet and then reassembled when they reach their destination.
TCP 将消息或文件分割成数据包，这些数据包通过互联网传输，并在到达目的地时重新组装。

The IP protocol takes these packets adds header information( address number of packets sender receiver ) and routes them to destination. It is responsible for the address of each packet so that it gets to the correct destination.
IP 协议接收这些数据包，添加头部信息（数据包的发送方和接收方地址编号），并将它们路由到目的地。它负责每个数据包的地址，以便其到达正确的目的地。

Resources 资源

https://barefootcas.org.uk/wp-content/uploads/2015/02/KS2-Search-Results-Selection-Activity-Barefoot-Computing.pdf

https://www.hpe.com/us/en/insights/articles/how-search-worked-before-google-1703.html

C.1 Creating the web C.1 创建万维网

C.1.1 Distinguish between the Internet and World Wide WebC.1.1 区分 Internet 和 World Wide Web

World Wide Web (www): 万维网 (www):

C.1.2 Describe how the web is constantly evolvingC.1.2 描述网络如何不断演变

C.1.3 Identify the characteristics of the following: C.1.3 识别以下各项的特征：

Identify the characteristics of the following HTTP HTTPS: 识别以下 HTTP 和 HTTPS 的特性：

hypertext transfer protocol (HTTP) 超文本传输协议（HTTP）

hypertext transfer protocol (HTTPS) 超文本传输协议（HTTPS）

Identify the characteristics of the following HTML URL XML XLST识别以下各项的特征：HTML URL XML XLST

hypertext mark-up language (HTML)超文本标记语言（HTML）

URL -- Uniform Resource LocatorURL —— Uniform Resource Locator

Extensible mark-up language (XML)可扩展标记语言（XML）

XLST – Extensible stylesheet language TransformationsXLST——可扩展样式表语言转换

Identify the characteristics of the following JavaScript , CSS 识别以下 JavaScript、CSS 的特性

JavaScript

CSS – Cascading style sheetCSS – 层叠样式表

C.1.4 Identify the characteristics of the following URI URL:C.1.4 识别以下 URI URL 的特征：

uniform resource identifier (URI) 统一资源标识符（URI）

URL.

C.1.5 Describe the purpose of a URLC.1.5 描述 URL 的用途

C.1.6 Describe how a domain name server functionsC.1.6 描述域名服务器如何运作

Describe how a domain name server functions描述域名服务器如何运作

C.1.7 Identify the characteristics of: IP, TCP and FTPC.1.7 识别以下协议的特征：IP、TCP 和 FTP

Identify the characteristics of: IP, TCP and FTP识别以下协议的特征：IP、TCP 和 FTP

internet protocol (IP) 网际协议（IP）

transmission control protocol (TCP)传输控制协议（TCP）

File transfer protocol (FTP)文件传输协议（FTP）

C.1.8 Outline the different components of a web page.C.1.8 概述网页的不同组件。

Outline the different components of a web page.概述网页的不同组成部分。

head

title

meta tags meta 标签

body

Navigation bar 导航栏

Hyperlinks 超链接

Table Of Contents 目录

Banner 横幅

Sidebar 侧边栏

C.1.9 Explain the importance of protocols and standards on the web.C.1.9 解释协议和标准在网络上的重要性。

C.1.9 Explain the importance of protocols and standards on the web.C.1.9 解释网络协议和标准的重要性。

C.1.10 Describe the different types of web pageC.1.10 描述不同类型的网页

Describe the different types of web page描述不同类型的网页

C.1.11 Explain the differences between a static web page and a dynamic web pageC.1.11 解释静态网页和动态网页之间的区别

C.1.11 Explain the differences between a static web page and a dynamic web pageC.1.11 解释静态网页与动态网页之间的区别

C.1.12 Explain the functions of a browserC.1.12 解释浏览器的功能

Explain the functions of a browser解释浏览器的功能

C.1.13 Evaluate the use of client-side scripting and server-side scripting in web pages the functions of a browserC.1.13 评估客户端脚本与服务器端脚本在网页中的应用以及浏览器的功能

C.1.14 Describe how web pages can be connected to underlying data sourcesC.1.14 描述网页如何连接到底层数据源

C.1.15 Describe the function of the common gateway interface (CGI)C.1.15 描述公共网关接口（CGI）的功能

C.1.16 Evaluate the structure of different types of web pages (examples seen in past paper include blogs, forums, etc.)C.1.16 评估不同类型网页的结构（历年试卷中的示例包括 blogs、forums 等）

Past paper Questions 历年真题

Describe how the web is constantly evolving阐述 Web 是如何持续演进的

The beginnings of the web (Web 1.0 , Web of content)万维网的起源（Web 1.0，内容网络）

Web 2.0 – “Web of the Users”Web 2.0 —— "用户之网"

Web 3.0 – “Semantic Web”Web 3.0——"语义网"

Later developments 后续发展

C.2 Searching the Web C.2 网络搜索

C.2.1 Define the term search engineC.2.1 定义术语 search engine

C.2.2 Distinguish between the surface web and the deep webC.2.2 区分表层网络和深层网络

Surface Web 表层网络

Deep web 深网

C.2.3 Outline the principles of searching algorithms used by search enginesC.2.3 概述搜索引擎使用的搜索算法原理

PageRank algorithm PageRank 算法

HITS algorithm HITS 算法

C.2.4 Describe how a web-crawler functionsC.2.4 描述网络爬虫的工作原理

Robots.txt

C.2.5 Discuss the relationship between data in a meta tag and how it is accessed by a web-crawlerC.2.5 讨论元标签中的数据与网络爬虫如何访问这些数据之间的关系

Robotics Tag

C.2.6 Discuss the use of parallel web-crawlingC.2.6 讨论并行网络爬虫技术的应用

Issues of parallel web crawling并行网络爬虫的问题

Discuss the use of parallel web crawling讨论并行网络爬取的使用

Why search engines take the quality approach ( dated )为何搜索引擎采取质量优先策略（已过时）

C.2.7 Outline the purpose of web-indexing in search enginesC.2.7 概述网络索引在搜索引擎中的目的。

C.2.8 Suggest how developers can create pages that appear more prominently in search engine results. Describe the different metrics used by search enginesC.2.8 建议开发者如何创建在搜索引擎结果中更显眼的页面。描述搜索引擎使用的不同指标。

C.2.9 Describe the different metrics used by search engines.C.2.9 描述搜索引擎使用的不同指标。

Top Metrics 关键指标

C.2.10 Explain why the effectiveness of a search engine is determined by the assumptions made when developing it.C.2.10 解释为什么搜索引擎的有效性取决于开发时所做出的假设。

C.2.11 Discuss the use of white hat and black hat search engine optimization.C.2.11 讨论白帽（white hat）和黑帽（black hat）搜索引擎优化的应用。

BLACK HAT 黑帽

Keyword stuffing 关键词堆砌

C.1.1 Distinguish between the Internet and World Wide Web
C.1.1 区分 Internet 和 World Wide Web

C.1.2 Describe how the web is constantly evolving
C.1.2 描述网络如何不断演变

C.1.3 Identify the characteristics of the following:
C.1.3 识别以下各项的特征：

Identify the characteristics of the following HTTP HTTPS:
识别以下 HTTP 和 HTTPS 的特性：

hypertext transfer protocol (HTTP)
超文本传输协议（HTTP）

hypertext transfer protocol (HTTPS)
超文本传输协议（HTTPS）

Identify the characteristics of the following HTML URL XML XLST
识别以下各项的特征：HTML URL XML XLST

hypertext mark-up language (HTML)
超文本标记语言（HTML）

URL -- Uniform Resource Locator
URL —— Uniform Resource Locator

Extensible mark-up language (XML)
可扩展标记语言（XML）

XLST – Extensible stylesheet language Transformations
XLST——可扩展样式表语言转换

Identify the characteristics of the following JavaScript , CSS
识别以下 JavaScript、CSS 的特性

CSS – Cascading style sheet
CSS – 层叠样式表

C.1.4 Identify the characteristics of the following URI URL:
C.1.4 识别以下 URI URL 的特征：

uniform resource identifier (URI)
统一资源标识符（URI）

C.1.5 Describe the purpose of a URL
C.1.5 描述 URL 的用途

C.1.6 Describe how a domain name server functions
C.1.6 描述域名服务器如何运作

Describe how a domain name server functions
描述域名服务器如何运作

C.1.7 Identify the characteristics of: IP, TCP and FTP
C.1.7 识别以下协议的特征：IP、TCP 和 FTP

Identify the characteristics of: IP, TCP and FTP
识别以下协议的特征：IP、TCP 和 FTP

transmission control protocol (TCP)
传输控制协议（TCP）

File transfer protocol (FTP)
文件传输协议（FTP）

C.1.8 Outline the different components of a web page.
C.1.8 概述网页的不同组件。

Outline the different components of a web page.
概述网页的不同组成部分。

`head`

`title`

`meta` tags `meta` 标签

`body`

C.1.9 Explain the importance of protocols and standards on the web.
C.1.9 解释协议和标准在网络上的重要性。

C.1.9 Explain the importance of protocols and standards on the web.
C.1.9 解释网络协议和标准的重要性。

C.1.10 Describe the different types of web page
C.1.10 描述不同类型的网页

Describe the different types of web page
描述不同类型的网页

C.1.11 Explain the differences between a static web page and a dynamic web page
C.1.11 解释静态网页和动态网页之间的区别

C.1.11 Explain the differences between a static web page and a dynamic web page
C.1.11 解释静态网页与动态网页之间的区别

C.1.12 Explain the functions of a browser
C.1.12 解释浏览器的功能

Explain the functions of a browser
解释浏览器的功能

C.1.13 Evaluate the use of client-side scripting and server-side scripting in web pages the functions of a browser
C.1.13 评估客户端脚本与服务器端脚本在网页中的应用以及浏览器的功能

C.1.14 Describe how web pages can be connected to underlying data sources
C.1.14 描述网页如何连接到底层数据源

C.1.15 Describe the function of the common gateway interface (CGI)
C.1.15 描述公共网关接口（CGI）的功能

C.1.16 Evaluate the structure of different types of web pages (examples seen in past paper include blogs, forums, etc.)
C.1.16 评估不同类型网页的结构（历年试卷中的示例包括 blogs、forums 等）

Describe how the web is constantly evolving
阐述 Web 是如何持续演进的

The beginnings of the web (Web 1.0 , Web of content)
万维网的起源（Web 1.0，内容网络）

Web 2.0 – “Web of the Users”
Web 2.0 —— "用户之网"

Web 3.0 – “Semantic Web”
Web 3.0——"语义网"

C.2.1 Define the term search engine
C.2.1 定义术语 search engine

C.2.2 Distinguish between the surface web and the deep web
C.2.2 区分表层网络和深层网络

C.2.3 Outline the principles of searching algorithms used by search engines
C.2.3 概述搜索引擎使用的搜索算法原理

C.2.4 Describe how a web-crawler functions
C.2.4 描述网络爬虫的工作原理

C.2.5 Discuss the relationship between data in a meta tag and how it is accessed by a web-crawler
C.2.5 讨论元标签中的数据与网络爬虫如何访问这些数据之间的关系

C.2.6 Discuss the use of parallel web-crawling
C.2.6 讨论并行网络爬虫技术的应用

Issues of parallel web crawling
并行网络爬虫的问题

Discuss the use of parallel web crawling
讨论并行网络爬取的使用

Why search engines take the quality approach ( dated )
为何搜索引擎采取质量优先策略（已过时）

C.2.7 Outline the purpose of web-indexing in search engines
C.2.7 概述网络索引在搜索引擎中的目的。

C.2.8 Suggest how developers can create pages that appear more prominently in search engine results. Describe the different metrics used by search engines
C.2.8 建议开发者如何创建在搜索引擎结果中更显眼的页面。描述搜索引擎使用的不同指标。

C.2.9 Describe the different metrics used by search engines.
C.2.9 描述搜索引擎使用的不同指标。

C.2.10 Explain why the effectiveness of a search engine is determined by the assumptions made when developing it.
C.2.10 解释为什么搜索引擎的有效性取决于开发时所做出的假设。

C.2.11 Discuss the use of white hat and black hat search engine optimization.
C.2.11 讨论白帽（white hat）和黑帽（black hat）搜索引擎优化的应用。

Syndicated / Copied Content
聚合/复制内容

Over Use of Key Words in Anchor Text
锚文本中关键词的过度使用

Site optimization Design
网站优化设计

A good User Experience (UX)
良好的用户体验（UX）

C.2.12 future challenges to search engines as the web continues to grow
C.2.12 随着网络持续发展搜索引擎将面临的未来挑战

C.3 Distributed approaches to the web
C.3 网络的分布式方法

Link to Peer-to-peer Slides and ubiquitous Computing
链接至 Peer-to-peer 幻灯片与 ubiquitous Computing

Link to Grid Computing Slides
网格计算幻灯片链接

C.3.1 Define the terms: mobile computing, ubiquitous computing, peer-2-peer network, grid computing
C.3.1 定义以下术语：移动计算（mobile computing）、普适计算（ubiquitous computing）、点对点网络（peer-2-peer network）、网格计算（grid computing）

What are The Key Resources that we can share on a grid network of Computers ?
在计算机的网格网络中，我们可以共享哪些关键资源？

What is ubiquitous Computing?
什么是 Ubiquitous Computing？

What is ubiquitous Computing?
什么是 Ubiquitous Computing？

Ubiquitous computing (pervasive computing)
普适计算（普及计算）

C.3.3 Distinguish between interoperability and open standards.
C.3.3 区分互操作性与开放标准。

C.3.4 Describe the range of hardware used by distributed networks.
C.3.4 描述分布式网络所使用的各类硬件。

C.3.5 Explain why distributed systems may act as a catalyst to a greater decentralization of the web
C.3.5 解释为什么分布式系统可能成为推动网络实现更大程度去中心化的催化剂

Compression & Decompression Week 2
压缩与解压缩第 2 周

C.3.6 Distinguish between lossless and lossy compression.
C.3.6 区分无损压缩和有损压缩。

C.3.7 Evaluate the use of decompression software in the transfer of information.
C.3.7 评估解压缩软件在信息传输中的使用。

Take advantage of redundancy : Repeated patterns and exploit by using coding
利用冗余的优势：通过编码技术利用重复模式

Take advantage of human limitations : in hearing and sight so we can discard some information without impacting the experience
利用人类感官的局限性：在听觉和视觉方面，这样我们可以在不影响体验的情况下舍弃部分信息

Lossless ( Preserves all information ) can be used in ?
无损（保留所有信息）可用于哪些场景？