Riko是一款Pytho流处理引擎,类似YahooPipes。采用纯pytho开发,用于分析处理结构化数据流。拥有同步和异步APIs,同时也支持并行RSSfeeds。Riko也支持字符终端界面。
功能特性:
可读取csv/xml/jso/html文件。
通过模块化的管道可创建文本流和数据流。
可解析、处理、提取RSS/Atomfeeds。
可创建强大的混合型APIs和maps。
支持并行处理。
使用示例代码:
>>> ### Create a SycPipe flow ###>>> #>>> # `SycPipe` is a coveiece class that creates chaiable flows>>> # ad allows for parallel processig.>>> from riko.collectios.syc import SycPipe>>>>>> ### Set the pipe cofiguratios ###>>> #>>> # Notes:>>> # 1. the `detag` optio will strip all html tags from the result>>> # 2. fetch the text cotaied iside the 'body' tag of the hackerews>>> # homepage>>> # 3. replace ewlies with spaces ad assig the result to 'cotet'>>> # 4. tokeize the resultig text usig whitespace as the delimeter>>> # 5. cout the umber of times each toke appears>>> # 6. obtai the raw stream>>> # 7. extract the first word ad its cout>>> # 8. extract the secod word ad its cout>>> # 9. extract the third word ad its cout>>> url = 'https://ews.ycombiator.com/'>>> fetch_cof = {... 'url': url, 'start': '<body>', 'ed': '</body>', 'detag': True} # 1>>>>>> replace_cof = {... 'rule': [... {'fid': '\r\', 'replace': ' '},... {'fid': '\', 'replace': ' '}]}>>>>>> flow = (... SycPipe('fetchpage', cof=fetch_cof) # 2... .strreplace(cof=replace_cof, assig='cotet') # 3... .strigtokeizer(cof={'delimiter': ' '}, emit=True) # 4... .cout(cof={'cout_key': 'cotet'})) # 5>>>>>> stream = flow.output # 6>>> ext(stream) # 7{"'sad": 1}>>> ext(stream) # 8{'(': 28}>>> ext(stream) # 9{'(1999)': 1}
评论