Differences

This shows you the differences between two versions of the page.

--- start [2017/10/18 10:59]
zoza [Scraping and mining Dezeen articles]
+++ start [2018/04/18 08:01]
zoza
@@ Line 1: / Line 1: @@
 ====== POSTDOCTORAL RESERACH ======
+===== Python and SOM =====
+ - python module by Vahid Moosavi of CAAD, **sompy**
+ - another SOM python implementation, **somoclu**: https://somoclu.readthedocs.io/en/stable/index.html
 ===== Scraping and mining twitter streams =====
@@ Line 22: / Line 28: @@
 Run scrapy directly from the shell:
-<code>$ scrapy startproject dezeen # start a project
+<code>$ scrapy startproject dezeen # start a project</code>
+Detailed instructions here: https://doc.scrapy.org/en/latest/intro/tutorial.html#creating-a-project
+Create a _spider_ in the folder dezeen/dezeen/spiders/ within which you will create a class that will declare its' name. This name will be used to call the spider from the console:
+<code>$ scrapy crawl spider_name</code>
+It is also important to declare fields in pages that will be scraped. This is done in the dezeen/items.py file, using eg (the Class is already declared when you start project).
+<code python>Class DezeenItem(Item):
+title = Field()
+link = Field()
+description = Field()
+</code>
+These fields will be later used as part of the item dictionary (e.g. item['link'])
 ====== DOCTORAL RESEARCH ======

emperor's new architecture research