avatardaniel harvey

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5534

Abstract

ime.datetime.now()), <span class="hljs-string">'price'</span>: price, <span class="hljs-string">'item'</span>: item, <span class="hljs-string">'discount'</span>: discount }</pre></div><p id="7bae">In our case, the output is:</p><div id="477f"><pre>{ 'date': '<span class="hljs-number">2018-05-13</span> 13:37:21.<span class="hljs-number">414342</span>', 'price': '169.99', 'item': 'Dyson AM08 Bladeless Pedestal Fan | White/Silver | Refurbished', 'discount': '399.99 | 57% off' }</pre></div><h1 id="8e42">Step 2: Saving to S3</h1><p id="f971">To save our result to S3, we use <a href="https://boto3.readthedocs.io/en/latest/index.html">Boto 3</a>, the AWS SDK for Python. First we get a reference to S3. Then we create an <code>object</code> with given <code>bucket</code> and <code>file_name</code>(the bucket was created beforehand though this can be done programmatically). Finally, we convert our data to <code>json</code> and write our <code>data</code> into the object.</p><div id="acf2"><pre><span class="hljs-keyword">import</span> boto3 <span class="hljs-keyword">import</span> json</pre></div><div id="34fd"><pre>def <span class="hljs-built_in">save_file_to_s3</span>(bucket, file_name, data): s3 = boto3.<span class="hljs-built_in">resource</span>(<span class="hljs-string">'s3'</span>) obj = s3.<span class="hljs-built_in">Object</span>(bucket, file_name) obj.<span class="hljs-built_in">put</span>(Body=json.<span class="hljs-built_in">dumps</span>(data))</pre></div><p id="1bfc">We will package this utility function in the handler file where we house the actual function Lambda will call. More on this in the following steps.</p><p id="5919">Incidentally, if saving tabular data as in our example, we might choose a database instead of S3. We illustrate S3 here as it is also a good choice for documents, we are also frequently scraped items.</p><h1 id="dd59">Step 3: The Handler Function</h1><p id="6f6c">A Lambda function needs a <a href="https://docs.aws.amazon.com/lambda/latest/dg/python-programming-model-handler-types.html">handler function</a>, which is the function Lambda will execute when it gets called. We will put the handler function, along with our Python dependencies, in a sub-directory called <code>ebay_deal_scraper</code>. This will allow us to separate the files which will be part of the Lambda package, and ancillary project files such as the Serverless config file discussed in later steps.</p><p id="2abf">We name our handler function <code>scrape</code> and give it the signature required by Lambda. We don’t use the <code>event</code> or <code>context</code> parameters, but if you needed to pass data into your Lambda function, they are what you would use.</p><div id="2ee8"><pre><span class="hljs-title">def</span> scrape(event, context): <span class="hljs-class"><span class="hljs-keyword">data</span> = deal_scrape()</span> file_name = f<span class="hljs-string">"deals-{data['date']}"</span> save_file_to_s3('ebay_daily_deals', file_name, <span class="hljs-class"><span class="hljs-keyword">data</span>)</span></pre></div><p id="c7da">Our handler calls our <code>deal_scrape()</code> function, then writes the returned data to S3 under a file name based on the date.</p><h1 id="6b1f">Step 4: Packaging our Function</h1><p id="8815">Our custom code is ready to go, but Lambda also requires you include your dependencies in the package you upload to AWS. In our case this means <code>pip installing</code> our Python packages locally. In the <code>ebay_deal_scrapper</code> directory, we run:</p><div id="bc42"><pre>pip3 <span class="hljs-keyword">install </span>requests <span class="hljs-keyword">bs4 </span>-t .</pre></div><p id="2ef0">(if you have any problems, see this <a href="https://stackoverflow.com/questions/24257803/distutilsoptionerror-must-supply-either-home-or-prefix-exec-prefix-not-both">Stack Overflow issue</a>)</p><p id="8621">This will install the <code>requests</code> and <code>beatiful soup</code> packages in our directory. Lamdba has<code>boto3</code> pre-installed, so you don’t need to include it.</p><p id="bea3">Incidentally, including dependencies can be quite hairy if you require platform-dependent C/C++ libraries like <a href="https://www.boost.org/">Boost</a>, and may require you to use Docker to bundle everything together under the Amazon Linux OS that Lambda requires. But that’s the topic of another blog post.</p><p id="8188">When you have dependencies, you probably want to use a zip file to package everything up. We give ours the generic name of <code>package.zip</code>:</p><div id="f3ce"><pre><span class="hljs-built_in">zip</span> -r package.<span class="hljs-built_in">zip</span> *</pre></div><h1 id="b305">Step 5: Deploying to Lambda</h1><p id="6f52">We will use the Serverless framework to deploy to AWS. Serverless offers a set of command line tools which make it very easy to deploy to the major serverless cloud providers including AWS. You can install it using <code>npm</code>:</p><p id="8fc7"><code>npm install -g serverless</code></p><p id="110d">You specify Serverless deployment instructions in a file called <code>serverless.yml</code>. If you want to generate a boilerplate file, you can run:</p><div id="4295"><pre>serverless <span class="hljs-built_in">create</span> <span class="hljs-comment">--template aws-python</span></pre></div><p id="d62a">This will also create a boilerplate<code>handler.py</code> file, but all we need there is the function which we specify i

Options

n the <code>serverless.yml</code> config file under the <code>functions</code> section.</p><div id="7831"><pre><span class="hljs-symbol">service:</span> ebay-deal-scraper <span class="hljs-symbol"> provider:</span> <span class="hljs-symbol"> name:</span> aws <span class="hljs-symbol"> runtime:</span> python3<span class="hljs-number">.6</span> <span class="hljs-symbol"> package:</span> <span class="hljs-symbol"> artifact:</span> ebay_deal_scraper/package.zip <span class="hljs-symbol"> functions:</span> <span class="hljs-symbol"> ebay_scrape:</span> <span class="hljs-symbol"> handler:</span> handler.scrape</pre></div><p id="8b75">The <code>provider</code> section tells Serverless we’re deploying to AWS and that we’re using <code>python 3.6</code>.</p><p id="5fec">The <code>package</code> section is where we specify the zip file we created in the last step.</p><p id="5313">We are ready to deploy!</p><p id="273c">From the top directory of your project run:</p><div id="d1ed"><pre><span class="hljs-attribute">serverless deploy</span></pre></div><p id="c024">If you have multiple AWS profiles (such as work and personal), you can specify a profile:</p><div id="981f"><pre>serverless <span class="hljs-keyword">deploy</span> <span class="hljs-params">--aws-profile</span> profile_i_want_to_use</pre></div><p id="eed1">If all goes well, you should get the following output:</p><div id="4f5e"><pre>➜ ebay<span class="hljs-params">-deals</span><span class="hljs-params">-scrape</span> git:(master) ✗ sd -<span class="hljs-params">-aws</span><span class="hljs-params">-profile</span> michael Serverless: Packaging service<span class="hljs-params">...</span> Serverless: Uploading CloudFormation file <span class="hljs-keyword">to</span> S3<span class="hljs-params">...</span> Serverless: Uploading artifacts<span class="hljs-params">...</span> Serverless: Validating template<span class="hljs-params">...</span> Serverless: Updating <span class="hljs-built_in">Stack</span><span class="hljs-params">...</span> Serverless: Checking <span class="hljs-built_in">Stack</span> update progress<span class="hljs-params">...</span> <span class="hljs-params">...</span><span class="hljs-params">...</span><span class="hljs-params">...</span> Serverless: <span class="hljs-built_in">Stack</span> update finished<span class="hljs-params">...</span> Service Information service: ebay<span class="hljs-params">-deal</span><span class="hljs-params">-scraper</span> stage: dev region: us<span class="hljs-params">-east</span><span class="hljs-number">-1</span> <span class="hljs-built_in">stack</span>: ebay<span class="hljs-params">-deal</span><span class="hljs-params">-scraper</span><span class="hljs-params">-dev</span> api keys: <span class="hljs-literal">None</span> endpoints: <span class="hljs-literal">None</span> functions: ebay_scrape: ebay<span class="hljs-params">-deal</span><span class="hljs-params">-scraper</span><span class="hljs-params">-dev</span><span class="hljs-params">-ebay_scrape</span> Serverless: Removing old service versions<span class="hljs-params">...</span></pre></div><p id="d320">You can now try to invoke your function with:</p><div id="b911"><pre>serverless<span class="hljs-built_in"> invoke </span>-f ebay_scrape <span class="hljs-comment"># --aws_profile profile_name </span></pre></div><p id="4385">This will result in an AccessDenied error. To fix this, you need to add permission to your Lambda function’s role (this gets created together with the function).</p><h1 id="da14">Step 6: Giving Lambda S3 Privileges</h1><p id="b13d">Go to the Roles page in the IAM section of the AWS dashboard. Find the role for your Lambda function, in our case <code>ebay-deal-scraper-dev-us-east-1-lambdaRole</code> and add a policy which allows it to access S3. <code>AmazonS3FullAccess</code> will work for our demo, though in production you may want to create a new policy which is more restrictive.</p><p id="e0d1">Once your function is deployed you should also be able to view and test it from the AWS console.</p><figure id="dc68"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*A1uCWEB2Hb-YS7lMEiXCww.png"><figcaption></figcaption></figure><h1 id="6080">Step 7: Scheduling the Lambda Function Using CloudWatch</h1><p id="8cd3">Final step! Go to the CloudWatch Management Page and click the<code>Rules</code> tab. Under Event Source, select <code>Schedule</code> and fill in a cron expression. We set ours to run every day at 6PM GMT, or afternoon in US Eastern Time. Next, in the <code>Targets</code> section, choose <code>Lambda function</code> in the select and then your Lambda function from the list in the <code>Function</code> select. You’re done!</p><figure id="1595"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*amgaVqAlsAl0pQdYaAP_PA.png"><figcaption></figcaption></figure><p id="eb37">That’s the whirlwind tour of scraping the serverless way. I hope this gives a taste of the technologies involved, though each of them could be the subject of many, many blog posts.</p><p id="f860">As far as scraping goes, with the rise of big data/machine learning, data acquisition is becoming more and more important. And if, for instance, you wanted to do something like feed your machine learning model with data to <a href="https://www.dataquest.io/blog/machine-learning-tutorial/">make price predictions for AirBnB</a>, scraping might be your only option. The serverless architecture exemplified here offers an efficient way to do this.</p></article></body>

Or, even worse, a white urban tech elite “problem” solving machine. There are two Silicon Valleys. The classic one that is full of physical engineers and transistors. The new one is full of San Francisco tech bros that make apps to replace their parents.

Recommended from ReadMedium