Frontera
latest
Frontera 概览
运行模式
单进程模式快速入门
分布式模式快速入门
集群安装指南
安装指南
Frontier 对象
Middlewares(中间件)
内置规范 URL 解算器参考
后端
消息总线
抓取策略
使用 Frontier 和 Scrapy
Settings
什么是 Crawl Frontier?
Graph Manager
记录 Scrapy 抓取过程
Frontera 集群优化
DNS 服务
架构概述
Frontera API
Frontier + Requests
例子
Tests
Logging
测试一个 Frontier
F.A.Q.
贡献指引
术语表
Frontera
Docs
»
索引
Edit on GitHub
索引
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
I
|
K
|
L
|
M
|
N
|
O
|
P
|
Q
|
R
|
S
|
T
|
U
|
Z
A
add_seeds() (frontera.core.components.Component 方法)
(frontera.core.components.frontera.core.components.Backend.Backend 方法)
(frontera.core.components.frontera.core.components.Metadata.Metadata 方法)
(frontera.core.components.frontera.core.components.Middleware.Middleware 方法)
(frontera.core.manager.FrontierManager 方法)
(frontera.worker.strategies.BaseCrawlingStrategy 方法)
AUTO_START
setting
auto_start (frontera.core.manager.FrontierManager 属性)
B
BACKEND
setting
backend (frontera.core.manager.FrontierManager 属性)
BaseCrawlingStrategy (frontera.worker.strategies 中的类)
BaseDecoder (frontera.core.codec 中的类)
BaseEncoder (frontera.core.codec 中的类)
BasicCanonicalSolver (frontera.contrib.canonicalsolvers.basic 中的类)
BC_MAX_REQUESTS_PER_HOST
setting
BC_MIN_HOSTS
setting
BC_MIN_REQUESTS
setting
body (frontera.core.models.Request 属性)
(frontera.core.models.Response 属性)
C
CANONICAL_SOLVER
setting
close() (frontera.worker.strategies.BaseCrawlingStrategy 方法)
Component (frontera.core.components 中的类)
cookies (frontera.core.models.Request 属性)
count() (frontera.core.components.frontera.core.components.Queue.Queue 方法)
CRAWLING_STRATEGY
setting
CrawlPage (內置类)
D
db worker
db_worker() (frontera.core.components.DistributedBackend 类方法)
decode() (frontera.core.codec.BaseDecoder 方法)
decode_request() (frontera.core.codec.BaseDecoder 方法)
DELAY_ON_EMPTY
setting
DOMAIN_FINGERPRINT_FUNCTION
setting
DomainFingerprintMiddleware (frontera.contrib.middlewares.fingerprint 中的类)
DomainMiddleware (frontera.contrib.middlewares.domain 中的类)
E
encode_add_seeds() (frontera.core.codec.BaseEncoder 方法)
encode_new_job_id() (frontera.core.codec.BaseEncoder 方法)
encode_offset() (frontera.core.codec.BaseEncoder 方法)
encode_page_crawled() (frontera.core.codec.BaseEncoder 方法)
encode_request() (frontera.core.codec.BaseEncoder 方法)
encode_request_error() (frontera.core.codec.BaseEncoder 方法)
encode_update_score() (frontera.core.codec.BaseEncoder 方法)
F
fetch() (frontera.core.components.frontera.core.components.States.States 方法)
finished (frontera.core.manager.FrontierManager 属性)
finished() (frontera.core.components.frontera.core.components.Backend.Backend 方法)
(frontera.worker.strategies.BaseCrawlingStrategy 方法)
flush() (frontera.core.components.frontera.core.components.States.States 方法)
from_manager() (frontera.core.components.Component 类方法)
(frontera.core.components.frontera.core.components.Backend.Backend 方法)
(frontera.core.components.frontera.core.components.Middleware.Middleware 方法)
from_settings() (frontera.core.manager.FrontierManager 类方法)
from_worker() (frontera.worker.strategies.BaseCrawlingStrategy 类方法)
frontera.contrib.backends.CommonBackend (內置类)
frontera.contrib.backends.hbase.HBaseBackend (內置类)
frontera.contrib.backends.memory.BASE (內置类)
frontera.contrib.backends.memory.BFS (內置类)
frontera.contrib.backends.memory.DFS (內置类)
frontera.contrib.backends.memory.FIFO (內置类)
frontera.contrib.backends.memory.LIFO (內置类)
frontera.contrib.backends.memory.RANDOM (內置类)
frontera.contrib.backends.remote.codecs.json (模块)
frontera.contrib.backends.sqlalchemy.BASE (內置类)
frontera.contrib.backends.sqlalchemy.BFS (內置类)
frontera.contrib.backends.sqlalchemy.DFS (內置类)
frontera.contrib.backends.sqlalchemy.FIFO (內置类)
frontera.contrib.backends.sqlalchemy.LIFO (內置类)
frontera.contrib.backends.sqlalchemy.RANDOM (內置类)
frontera.contrib.backends.sqlalchemy.revisiting.Backend (內置类)
frontera.core.components.Backend (內置类)
frontera.core.components.DistributedBackend (內置类)
frontera.core.components.Metadata (內置类)
frontera.core.components.Middleware (內置类)
frontera.core.components.Queue (內置类)
frontera.core.components.States (內置类)
frontera.settings.Settings (內置类)
FRONTERA_SETTINGS
setting
frontier_start() (frontera.core.components.Component 方法)
(frontera.core.components.frontera.core.components.Backend.Backend 方法)
(frontera.core.components.frontera.core.components.Middleware.Middleware 方法)
frontier_stop() (frontera.core.components.Component 方法)
(frontera.core.components.frontera.core.components.Backend.Backend 方法)
(frontera.core.components.frontera.core.components.Middleware.Middleware 方法)
FrontierManager (frontera.core.manager 中的类)
G
get_next_requests() (frontera.core.components.frontera.core.components.Backend.Backend 方法)
(frontera.core.components.frontera.core.components.Queue.Queue 方法)
(frontera.core.manager.FrontierManager 方法)
H
HBASE_BATCH_SIZE
setting
HBASE_DROP_ALL_TABLES
setting
HBASE_METADATA_TABLE
setting
HBASE_NAMESPACE
setting
HBASE_QUEUE_TABLE
setting
HBASE_STATE_CACHE_SIZE_LIMIT
setting
HBASE_THRIFT_HOST
setting
HBASE_THRIFT_PORT
setting
HBASE_USE_FRAMED_COMPACT
setting
HBASE_USE_SNAPPY
setting
headers (frontera.core.models.Request 属性)
(frontera.core.models.Response 属性)
hostname_local_fingerprint() (在 frontera.utils.fingerprint 模块中)
I
id (CrawlPage 属性)
is_seed (CrawlPage 属性)
iteration (frontera.core.manager.FrontierManager 属性)
K
KAFKA_CODEC
setting
KAFKA_GET_TIMEOUT
setting
KAFKA_LOCATION
setting
L
links (CrawlPage 属性)
LOGGING_CONFIG
setting
M
MAX_NEXT_REQUESTS
setting
max_next_requests (frontera.core.manager.FrontierManager 属性)
MAX_REQUESTS
setting
max_requests (frontera.core.manager.FrontierManager 属性)
message bus
MESSAGE_BUS
setting
MESSAGE_BUS_CODEC
setting
MessageBusBackend (frontera.contrib.backends.remote.messagebus 中的类)
meta (frontera.core.models.Request 属性)
(frontera.core.models.Response 属性)
metadata (frontera.core.components.Backend 属性)
method (frontera.core.models.Request 属性)
MIDDLEWARES
setting
middlewares (frontera.core.manager.FrontierManager 属性)
N
n_requests (frontera.core.manager.FrontierManager 属性)
name (frontera.core.components.Component 属性)
NEW_BATCH_DELAY
setting
O
OVERUSED_SLOT_FACTOR
setting
P
page_crawled() (frontera.core.components.Component 方法)
(frontera.core.components.frontera.core.components.Backend.Backend 方法)
(frontera.core.components.frontera.core.components.Metadata.Metadata 方法)
(frontera.core.components.frontera.core.components.Middleware.Middleware 方法)
(frontera.core.manager.FrontierManager 方法)
(frontera.worker.strategies.BaseCrawlingStrategy 方法)
page_error() (frontera.worker.strategies.BaseCrawlingStrategy 方法)
Q
queue (frontera.core.components.Backend 属性)
R
RECORDER_ENABLED
setting
RECORDER_STORAGE_CLEAR_CONTENT
setting
RECORDER_STORAGE_DROP_ALL_TABLES
setting
RECORDER_STORAGE_ENGINE
setting
referers (CrawlPage 属性)
Request (frontera.core.models 中的类)
request (frontera.core.models.Response 属性)
request_error() (frontera.core.components.Component 方法)
(frontera.core.components.frontera.core.components.Backend.Backend 方法)
(frontera.core.components.frontera.core.components.Metadata.Metadata 方法)
(frontera.core.components.frontera.core.components.Middleware.Middleware 方法)
(frontera.core.manager.FrontierManager 方法)
REQUEST_MODEL
setting
request_model (frontera.core.manager.FrontierManager 属性)
Response (frontera.core.models 中的类)
RESPONSE_MODEL
setting
response_model (frontera.core.manager.FrontierManager 属性)
S
schedule() (frontera.core.components.frontera.core.components.Queue.Queue 方法)
scoring log
SCORING_LOG_CONSUMER_BATCH_SIZE
setting
SCORING_LOG_DBW_GROUP
setting
SCORING_LOG_TOPIC
setting
SCORING_PARTITION_ID
setting
set_states() (frontera.core.components.frontera.core.components.States.States 方法)
setting
AUTO_START
BACKEND
BC_MAX_REQUESTS_PER_HOST
BC_MIN_HOSTS
BC_MIN_REQUESTS
CANONICAL_SOLVER
CRAWLING_STRATEGY
DELAY_ON_EMPTY
DOMAIN_FINGERPRINT_FUNCTION
FRONTERA_SETTINGS
HBASE_BATCH_SIZE
HBASE_DROP_ALL_TABLES
HBASE_METADATA_TABLE
HBASE_NAMESPACE
HBASE_QUEUE_TABLE
HBASE_STATE_CACHE_SIZE_LIMIT
HBASE_THRIFT_HOST
HBASE_THRIFT_PORT
HBASE_USE_FRAMED_COMPACT
HBASE_USE_SNAPPY
KAFKA_CODEC
KAFKA_GET_TIMEOUT
KAFKA_LOCATION
LOGGING_CONFIG
MAX_NEXT_REQUESTS
MAX_REQUESTS
MESSAGE_BUS
MESSAGE_BUS_CODEC
MIDDLEWARES
NEW_BATCH_DELAY
OVERUSED_SLOT_FACTOR
RECORDER_ENABLED
RECORDER_STORAGE_CLEAR_CONTENT
RECORDER_STORAGE_DROP_ALL_TABLES
RECORDER_STORAGE_ENGINE
REQUEST_MODEL
RESPONSE_MODEL
SCORING_LOG_CONSUMER_BATCH_SIZE
SCORING_LOG_DBW_GROUP
SCORING_LOG_TOPIC
SCORING_PARTITION_ID
SPIDER_FEED_GROUP
SPIDER_FEED_PARTITIONS
SPIDER_FEED_TOPIC
SPIDER_LOG_CONSUMER_BATCH_SIZE
SPIDER_LOG_DBW_GROUP
SPIDER_LOG_PARTITIONS
SPIDER_LOG_SW_GROUP
SPIDER_LOG_TOPIC
SPIDER_PARTITION_ID
SQLALCHEMYBACKEND_CACHE_SIZE
SQLALCHEMYBACKEND_CLEAR_CONTENT
SQLALCHEMYBACKEND_DROP_ALL_TABLES
SQLALCHEMYBACKEND_ENGINE
SQLALCHEMYBACKEND_ENGINE_ECHO
SQLALCHEMYBACKEND_MODELS
SQLALCHEMYBACKEND_REVISIT_INTERVAL
STATE_CACHE_SIZE
STORE_CONTENT
TEST_MODE
TLDEXTRACT_DOMAIN_INFO
URL_FINGERPRINT_FUNCTION
ZMQ_ADDRESS
ZMQ_BASE_PORT
settings (frontera.core.manager.FrontierManager 属性)
spider
spider feed
spider log
SPIDER_FEED_GROUP
setting
SPIDER_FEED_PARTITIONS
setting
SPIDER_FEED_TOPIC
setting
SPIDER_LOG_CONSUMER_BATCH_SIZE
setting
SPIDER_LOG_DBW_GROUP
setting
SPIDER_LOG_PARTITIONS
setting
SPIDER_LOG_SW_GROUP
setting
SPIDER_LOG_TOPIC
setting
SPIDER_PARTITION_ID
setting
SQLALCHEMYBACKEND_CACHE_SIZE
setting
SQLALCHEMYBACKEND_CLEAR_CONTENT
setting
SQLALCHEMYBACKEND_DROP_ALL_TABLES
setting
SQLALCHEMYBACKEND_ENGINE
setting
SQLALCHEMYBACKEND_ENGINE_ECHO
setting
SQLALCHEMYBACKEND_MODELS
setting
SQLALCHEMYBACKEND_REVISIT_INTERVAL
setting
start() (frontera.core.manager.FrontierManager 方法)
state cache
STATE_CACHE_SIZE
setting
states (frontera.core.components.Backend 属性)
status (CrawlPage 属性)
status_code (frontera.core.models.Response 属性)
stop() (frontera.core.manager.FrontierManager 方法)
STORE_CONTENT
setting
strategy worker
strategy_worker() (frontera.core.components.DistributedBackend 类方法)
T
TEST_MODE
setting
test_mode (frontera.core.manager.FrontierManager 属性)
TLDEXTRACT_DOMAIN_INFO
setting
U
update_cache() (frontera.core.components.frontera.core.components.States.States 方法)
url (CrawlPage 属性)
(frontera.core.models.Request 属性)
(frontera.core.models.Response 属性)
URL_FINGERPRINT_FUNCTION
setting
UrlFingerprintMiddleware (frontera.contrib.middlewares.fingerprint 中的类)
Z
ZMQ_ADDRESS
setting
ZMQ_BASE_PORT
setting
Read the Docs
v: latest
Versions
latest
Downloads
htmlzip
On Read the Docs
Project Home
Builds
Free document hosting provided by
Read the Docs
.