Minimal Twitter to Google Pub/Sub example with Scala

Posted in: Big Data, Cloud, Google Cloud Platform, Technical Track

Recently I was looking for a simple Twitter to Pub/Sub streaming pipeline and ended up with own implementation in Scala. I tried to make it as compact as possible. So I chose the dispatch and Google Pub/Sub client libraries for Java.

You should have a Google Cloud Platform service account key and Twitter API consumer key and tokens ready to start.

1. Create publisher:

  val publisher = Publisher.
    newBuilder(TopicName.of("projectId", "topic")).
    setCredentialsProvider(FixedCredentialsProvider.create(
      GoogleCredentials.fromStream(new FileInputStream("key.json")))).
    build()

2. Get stream of statuses/sample messages and publish them!

  Http.default(
    url("https://stream.twitter.com/1.1/statuses/sample.json") <@ (
      new ConsumerKey("consumerKey", "consumerSecret"),
      new RequestToken("accessToken", "accessTokenSecret")) >
    as.stream.Lines { tweet =>
        publisher.publish(PubsubMessage.newBuilder.setData(ByteString.copyFromUtf8(tweet)).build())
      })

As improvements you may want to configure BatchingSettings settings for Publisher and add various exception handlers.
You can find full source code here.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Valentin is a specialist in Big Data and Cloud solutions. He has extensive expertise in Cloudera Hadoop Distribution, Google Cloud Platform and skilled in building scalable performance critical distributed systems and data visualization systems.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *