Or, even worse, a white urban tech elite “problem” solving machine.

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5534

Abstract

ime.datetime.now()), 'price': price, 'item': item, 'discount': discount }</pre></div>In our case, the output is:<div id="477f"><pre>{ 'date': '2018-05-13 13:37:21.414342', 'price': ' $169.99', 'item': 'Dyson AM08 Bladeless Pedestal Fan | White/Silver | Refurbished', 'discount': '$ 399.99 | 57% off' }</pre></div><h1 id="8e42">Step 2: Saving to S3</h1>To save our result to S3, we use <a href="https://boto3.readthedocs.io/en/latest/index.html">Boto 3</a>, the AWS SDK for Python. First we get a reference to S3. Then we create an <code>object</code> with given <code>bucket</code> and <code>file_name</code>(the bucket was created beforehand though this can be done programmatically). Finally, we convert our data to <code>json</code> and write our <code>data</code> into the object.<div id="acf2"><pre>import boto3 import json</pre></div><div id="34fd"><pre>def save_file_to_s3(bucket, file_name, data): s3 = boto3.resource('s3') obj = s3.Object(bucket, file_name) obj.put(Body=json.dumps(data))</pre></div>We will package this utility function in the handler file where we house the actual function Lambda will call. More on this in the following steps.Incidentally, if saving tabular data as in our example, we might choose a database instead of S3. We illustrate S3 here as it is also a good choice for documents, we are also frequently scraped items.<h1 id="dd59">Step 3: The Handler Function</h1>A Lambda function needs a <a href="https://docs.aws.amazon.com/lambda/latest/dg/python-programming-model-handler-types.html">handler function</a>, which is the function Lambda will execute when it gets called. We will put the handler function, along with our Python dependencies, in a sub-directory called <code>ebay_deal_scraper</code>. This will allow us to separate the files which will be part of the Lambda package, and ancillary project files such as the Serverless config file discussed in later steps.We name our handler function <code>scrape</code> and give it the signature required by Lambda. We don’t use the <code>event</code> or <code>context</code> parameters, but if you needed to pass data into your Lambda function, they are what you would use.<div id="2ee8"><pre>def scrape(event, context): data = deal_scrape() file_name = f"deals-{data['date']}" save_file_to_s3('ebay_daily_deals', file_name, data)</pre></div>Our handler calls our <code>deal_scrape()</code> function, then writes the returned data to S3 under a file name based on the date.<h1 id="6b1f">Step 4: Packaging our Function</h1>Our custom code is ready to go, but Lambda also requires you include your dependencies in the package you upload to AWS. In our case this means <code>pip installing</code> our Python packages locally. In the <code>ebay_deal_scrapper</code> directory, we run:<div id="bc42"><pre>pip3 install requests bs4 -t .</pre></div>(if you have any problems, see this <a href="https://stackoverflow.com/questions/24257803/distutilsoptionerror-must-supply-either-home-or-prefix-exec-prefix-not-both">Stack Overflow issue</a>)This will install the <code>requests</code> and <code>beatiful soup</code> packages in our directory. Lamdba has<code>boto3</code> pre-installed, so you don’t need to include it.Incidentally, including dependencies can be quite hairy if you require platform-dependent C/C++ libraries like <a href="https://www.boost.org/">Boost</a>, and may require you to use Docker to bundle everything together under the Amazon Linux OS that Lambda requires. But that’s the topic of another blog post.When you have dependencies, you probably want to use a zip file to package everything up. We give ours the generic name of <code>package.zip</code>:<div id="f3ce"><pre>zip -r package.zip *</pre></div><h1 id="b305">Step 5: Deploying to Lambda</h1>We will use the Serverless framework to deploy to AWS. Serverless offers a set of command line tools which make it very easy to deploy to the major serverless cloud providers including AWS. You can install it using <code>npm</code>:<code>npm install -g serverless</code>You specify Serverless deployment instructions in a file called <code>serverless.yml</code>. If you want to generate a boilerplate file, you can run:<div id="4295"><pre>serverless create --template aws-python</pre></div>This will also create a boilerplate<code>handler.py</code> file, but all we need there is the function which we specify i

Options

n the <code>serverless.yml</code> config file under the <code>functions</code> section.<div id="7831"><pre>service: ebay-deal-scraper provider: name: aws runtime: python3.6 package: artifact: ebay_deal_scraper/package.zip functions: ebay_scrape: handler: handler.scrape</pre></div>The <code>provider</code> section tells Serverless we’re deploying to AWS and that we’re using <code>python 3.6</code>.The <code>package</code> section is where we specify the zip file we created in the last step.We are ready to deploy!From the top directory of your project run:<div id="d1ed"><pre>serverless deploy</pre></div>If you have multiple AWS profiles (such as work and personal), you can specify a profile:<div id="981f"><pre>serverless deploy --aws-profile profile_i_want_to_use</pre></div>If all goes well, you should get the following output:<div id="4f5e"><pre>➜ ebay-deals-scrape git:(master) ✗ sd --aws-profile michael Serverless: Packaging service... Serverless: Uploading CloudFormation file to S3... Serverless: Uploading artifacts... Serverless: Validating template... Serverless: Updating Stack... Serverless: Checking Stack update progress... ......... Serverless: Stack update finished... Service Information service: ebay-deal-scraper stage: dev region: us-east-1 stack: ebay-deal-scraper-dev api keys: None endpoints: None functions: ebay_scrape: ebay-deal-scraper-dev-ebay_scrape Serverless: Removing old service versions...</pre></div>You can now try to invoke your function with:<div id="b911"><pre>serverless invoke -f ebay_scrape # --aws_profile profile_name </pre></div>This will result in an AccessDenied error. To fix this, you need to add permission to your Lambda function’s role (this gets created together with the function).<h1 id="da14">Step 6: Giving Lambda S3 Privileges</h1>Go to the Roles page in the IAM section of the AWS dashboard. Find the role for your Lambda function, in our case <code>ebay-deal-scraper-dev-us-east-1-lambdaRole</code> and add a policy which allows it to access S3. <code>AmazonS3FullAccess</code> will work for our demo, though in production you may want to create a new policy which is more restrictive.Once your function is deployed you should also be able to view and test it from the AWS console.<figure id="dc68"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*A1uCWEB2Hb-YS7lMEiXCww.png"><figcaption></figcaption></figure><h1 id="6080">Step 7: Scheduling the Lambda Function Using CloudWatch</h1>Final step! Go to the CloudWatch Management Page and click the<code>Rules</code> tab. Under Event Source, select <code>Schedule</code> and fill in a cron expression. We set ours to run every day at 6PM GMT, or afternoon in US Eastern Time. Next, in the <code>Targets</code> section, choose <code>Lambda function</code> in the select and then your Lambda function from the list in the <code>Function</code> select. You’re done!<figure id="1595"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*amgaVqAlsAl0pQdYaAP_PA.png"><figcaption></figcaption></figure>That’s the whirlwind tour of scraping the serverless way. I hope this gives a taste of the technologies involved, though each of them could be the subject of many, many blog posts.As far as scraping goes, with the rise of big data/machine learning, data acquisition is becoming more and more important. And if, for instance, you wanted to do something like feed your machine learning model with data to <a href="https://www.dataquest.io/blog/machine-learning-tutorial/">make price predictions for AirBnB</a>, scraping might be your only option. The serverless architecture exemplified here offers an efficient way to do this.</article></body>